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1  Introduction 


In^thie  paper  we  considers*  two-decision-maker  problem 


ft^cL  1 J 


where  each  decision  maker  has  his  own  information  and  study'  the 


impact  of  improving  the  information  of  only  one  decision  maker. 

pf '  * . .  ?  c  /i  i .  /i  t 

In'f£t  an  example  of  a  two- decision- maker  LQG  static  Nash 


game  was  considered  and  was  shown  for  that  particular  example  that, 
on  the  one  hand.  If  one  of  the  decision  makers  improves  his  own 
information  by  obtaining  his  opponent's  information  (while  his 
opponent’s  information  does  not  change)  then  he  ends  up  with  a 


higher  Nash  cost  (Case  B  of  ft  ]);  on  the  other  hand,  if  he  improves 
his  own  information  by  getting  an  extra  measurement  not  from  his 


opponent  (while  his  opponent's  information  does  not  change)  then 


r 


he  might  incur  lower  Nash  cost  (Case  D  of  [£,])!'  In  this  paper 
proves^hat  in  a  general  two-decision-maker  LQG  static  or  dynamic 
Nash  game,  if  one  of  the  decision  makers  knows  all  his  opponent's 


s 


information,  then  more  or  better  information  for  him  alone  is 

/A'  r.  5 

beneficial  to  him.  In  static  games  we  also  prove  that  more - ,  , 
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Information  for  one  of  the  decision  makers  alone  is  beneficial  to 
him  provided  that  such  information  is  orthogonal  to  both  decision 
makers' information.  ^ *lZ  ' 

The /structure ''of  this  pa (per  ^  (p  ah  follows:  In  Section  .  2  we 
study  static  games.  By  Introducing  the  orthogonality  condition  of 
the  information  we  give  sufficient  conditions  that  more  information 
is  beneficial  to  one  of  the  decision  makers.  In  Section  3  we 
formulate  a  two- decision-maker  LQG  dynamic  Nash  game  where 
one  of  the  decision  maker's  information  is  nested  in  the  other's. 

At  each  stage  k,  decision  maker  1  is  allowed  to  use  a  function  of 
estimates  Xj{k)  and  x^(k)  of  x(k)  while  decision  maker  2  is  allowed 
to  use  a  function  of  x^(k)  only,  where  x^(k)  and  x^(k)  are  generated 
through  two  Kalman  filters  that  use  linear,  noise- corrupted 
measurements  of  x(k)  and  x^(k)  is  a  refinement  of  x^(k).  In  this 
setup  the  Nash  solution  exists,  is  unique  and  linear  in  x^(k)  and 
x^(k)  under  certain  invertibility  assumptions  on  some  matrices. 
Two  nice  features  about  the  solution  hold,  namely,  that  a  sort  of 
separation  principle  of  estimation  and  control  holds  and  the  estima¬ 
tion  error  is  Independent  of  the  controls.  In  Section  .4  we  study 
the  informational  properties  of  the  game  formulated  in  Section  3. 
We  prove  that  better  information  for  decision  maker  1  alone  is 
beneficial  to  him.  In  Section  5  we  extend  the  results  obtained  in 
Nash  games  to  Stackelberg  games.  In  Section  .  6  we  give  two 
examples  to  illustrate  the  informational  properties  discussed  in  the 
previous  sections.  Finally,  in  Section  -x  7  we  present  our  conclu¬ 


sions. 


Consider  a  two-decision-maker  LQG  static  Nash  game.  The 
cost  functional  of  decision  maker  i,  1=1,  2  is  denoted  by^ 


y2>  *  Etx'P'^  +  \  u[V  ulQjU.]  (1) 

J#l,  l.j-J.2 

l. 

where  x€Rn  is  a  Gaussian  random  vector,  x  "  N(0, 0),  Uj€R  1  is  the 
control  variable  of  decision  maker  i  and  P^,  Q.  are  real  constant 
matrices  of  appropriate  dimensions.  The  linear  measurement  of 
decision  maker  i  is  given  by 


yl  *  Hix  +  *  i  (2) 

H.  is  an  m. Xn  real  constant  matrix  and  w.  is  a  Gaussian  random 

jr/uA  wj  ( j  4  i) 

vector,  u>  _  ~N(0,  £.)  which  is  independentof  x.  The  control  law  y. 
is  chosen  from  r  where  I*.  consists  of  all  the  measurable  functions 

mi  li 

from  R  to  R  such  that  y.(y^)  is  a  second  order  random  vector. 

A  pair  (y*,  Y-j)  is  called  a  Nash  solution  of  the  game  if  it  satisfies 
the  following  two  inequalities 


J1*Y1'  Y2J  =  J1(y1' y2> 
.  j2(V*»Y^  =  J2*Y1,Y2* 
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for  every  y^CTj  end  y^  i*  celled  the  Nash  strategy  of 

decision  maker  l.  A  necessary  end  sufficient  condition  character¬ 
izing  e  Nash  solution  of  the  above  game  was  given  in  Theorem  1  of 
[?]  which  we  state  below  as  a  lemma. 

Lemma  2.  1.  A  pair  (y*,  y£)  l*  a  Nash  solution  of  the  game 
described  above  if  and  only  if  the  following  two  equalities  hold 

s 

**>  *  "PiE[xUi1  '  QiE^  Y^yj)l  yi^  W 

j*  l,  i.j  =  1,2 

Using  Lemma  2. 1  we  will  show  how  the  Nash  solution  is  affected 
by  the  information  available  to  the  decision  makers  and  hence  how 
the  Nash  performance  is  affected  by  the  information  structure.  We 
need  the  following  definition  of  orthogonality  and  a  lemma  which 
consists  of  several  well-known  facts  in  estimation  theory  [/fr]. 

Definition  2.  2.  Two  zero-mean  Gaussian  random  vectors  Zj  and 
Z2  are  said  to  be  orthogonal  (denoted  by  z^jl  if  E[  z.,  )  =  0.  T? 

s^ts^Zj^and  are  orthogonal  if  z^  j.  Z2  for  every  z^€Zj  and  x^cT.^ 

Lemma  2.  3.  Let  z.,  i=l,2,  3  be  zero-mean  Gaussian  random 
vectors,  then 

(i)  •(  Zj  -  E[  Zj  |  z2]}  J.  z2 

(ii)  E[  Zj|  Z2]  =  CZ2  where  C  is  a  real  matrix. 

If,  in  addition,  Z2  _L  *  j,  then 

(iii)  E(z2|z3]  =  0 


H 


(Iv)  E[*1|*2,*3]  *  eI*11*2^  + 


Denote  an  extra  measurement  by  ye, 

y  =  H  x  +  u>  (5) 

7e  e  e 

where  H  is  an  m  Xn  real  constant  matrix  and  »  Is  a  Gaussian 
c  c  ® 

random  vector,  ®e  -  N(0,  E^)  and  is  independent  of  Xj  ^  . 

Condition  C  (i)  ye  JL  C  y^Yjj}  »  (li)  y2  ~  Myj 

where  M  is  an  n^Xm^  matrix.  The  meaning  of  Condition  C  (tt)  is 
that  the  information  provided  by  y2  Is  contained  in  that  provided  by 

Lemma  2.  4-.  Under  either  one  of  Conditions  C, 

CE[*lyl’yJ  '  E[xjyi]}  1  (  yj,y2). 

Proof:  Under  Condition  C  (i).  Lemma  2.  3  (iv)  and  (ii)  imply  that 

E[x|  yj*y.]“  E[ac|jTj]  =  E[x|yj]  +  E[x|ye]  -E[x|yj] 

=  E[*|ye]  =  Cye  (6, 

The  result  holds  since  Ye  J.  {  YyYz  \  • 

Under  Condition  C  (ii), 

v 

E[*|yryeJ'EtxlyiJ s  Erxlyi»ye^"EfEfxlyrye^lyi^  (7) 

and  thus  (Lemma  .  2.  3  (i))  {  E[  x|yj,ye]-  E[xjyj])  J.  y^  and  by 
Condition  C  (ii)  (EfxJyj.yJ  -  E[ xjyj]}  J.  y2« 


■  »n-»-  [««r>w>'rr  , 


MCTJ  .wo*’*’* 
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The  question  of  existence  and  uniqueness  of  the  Nash  solution  has  / 
been  studied  in  [  3  ]  and  [f  ]  where  it  was  shown  that  almost  always  I 
there  exists  a  unique  solution  which  has  to  be  an  affine  function  of 
the  information. 

Theorem  2,y.  Let  at  least  one  of  Condition  C  (i),  (ii)  hold,  then 
if  there  exists  a  Nash  solution  under  the  Information  pattern  where 
decision  maker  1  knows  y^  and  decision  maker  2  knows  y 2  then  there 
exists  a  Nash  solution  under  the  information  pattern  where  decision 
maker  1  knows  (y^,  y^)  and  decision  maker  2  knows  y2  and  vice  versa. 
Furthermore,  the  Nash  strategy  y2  same  under  both  informa¬ 

tion  patterns.  For  the  case  where  Condition  C  (ii)  holds,  a  Nash 
solution  exists  and  is  unique  if  and  only  if  the  matrix  I-Q^Q^  is 
invertible. 

Proof:  (i)  Let  Condition  C  (i)  hold.  «. 

When  decision  maker  1  knows  y j  and  decision  maker  2  knows 
Y2,  by  Lemma  2.  1,  a  Nash  solution  (Yj(yj)»  y2(y2))  exists  an<* 
only  if  _ 

y2W2)  =  Q2Q1E[E['<2(y2)  |  y1]ly2]+Q2P1E[E[x|y1]|y2)-P2E[xly2]  (8) 

When  decision  maker  1  knows  (y^ye}  *nd  decision  maker  2  knows 
y2*  a  Nash  solution  (Vj (yj.  Ye)»  "i2W2))  ex*,ta  ^  onty  i* 


*2^2*  =  ®2^1E^E^y2^y2^  !yl'  ye^y2^  +  Q2PlEfEfxlyrye^y2^^E^X  ly2^ 

(9a) 

=  Q2Q1E(E[Y2(y2)  Jyj]  lyz3  +  Q2PlE[E[x  jy^  |y2)  -P2E[x  |y2]  (9b) 


i 


i 


5*i nryr-» mroorv *  I 


where  we  use  Lemma  2.  3  (iii)  and  (iv)  and  the  fact  that  ^^2^  ** 
affine  in  y2.  Equation*  (8)  and  (9b)  are  exactly  the  same  hence  we 
have  the  desired  result. 

(it)  Let  Condition  C  (ii)  hold. 

When  decision  maker  1  knows  y^  and  decision  maker  2  knows 
y2,  a  Nash  solution  ^2^2^  exi8t*  **  an<*  onty  tf 

f2<y2)  -  Q2Q2Y2<y2)  +Q2P1E[*|r2]  -  P2E[x)y2]  (10) 

When  decision  maker  1  knows  {y . ,  y  }  and  decision  maker  2  knows 
y2»  a  Nash  solution  ^2^2^  ex‘*ts  ^  onty  i* 

y2(y2)  *  Q2Q^2(y2)  +  Q2PjE[x|  y2]  -  P2E[x|y2]  (1 1) 

Equations  (10)  and  (11)  are  the  same  and  hence  if  a  Nash  solution 
exists  in  one  of  the  information  patterns,  it  exists  in  the  other  and 
Y2  is  the  same  in  both  information  patterns.  Furthermore,  a  unique 
Nash  solution  exists  if  and  only  if  I-  Q^Q2  **  invertible. 

Theorem  2.  4.  Let  Condition  C  (i)  or  (ii)  hold,  then  the  Nash  cost 
incurred  to  decision  maker  1  when  the  informationYto  decision  maker 
1  is  (  yj,  y^}  and  to  decision  maker  2  is  y2  is  less  than  or  equal  to 
^  the  Nash  cost  incurred  to  decision  maker  1  when  the  information 
available  to  decision  maker  1  is  y^  and  to  decision  maker  2  is  y2> 

Proof:  Let  (y*.  y2)  denote  the  Nash  solution  when  decision  maker  1 
knows  {  y  ,y  )  and  decision  maker  2  knows  y2  and  (y®,  y®)  the  Nash 
solution  when  decision  maker  1  knows  v.  and  decision  maker  2  knows 


y2,  then  by  Theorem  2.  5 


J1<Y1'  Y2*  =  Ji^i*^2) 


J1(Y1'V 


y1(y1*  ye)€l>‘ 

<  min  Ji(Yi»y2) 

Yl(YL,€r‘- 
*  JltYl'Y2) 


m,+m_ 


where  P.  consists  of  ell  the  measurable  functions  from  R  to  • 

l\ 

R  •  D 

Remark  2. 7.  Notice  that  Theorem  t  2.  S’  and  2.fc  hold  regard¬ 
less  of  the  functional  form  of  the  costs  as  long  as  they  are  quadratic. 

Remark  2.  8.  All  the  results  obtained  in  this  section  go  through 
even  if  we  assume  that  x  is  not  of  zero  mean.  This  is  easy  to  verify. 


m.  Formulation  of  an  LQG  Dynamic  Nash  Game  and  Its  Solution 


Consider  a  two-decision-maker,  N- stage  Nash  game  where  the 
state  of  the  system  x(  • )  evolves  according  to 

x(k+l)  a  Ax(k)  +  E|Uj(k)  +  B2u2(k)  +  u>(k),  x(0)  =  xQ  (17) 

where  k€8i  =  f  0,  1, . . .  ,N-1}  ,  x(k)€  R°  and  u^fk)  €  R  '  denotes  the 
control  variable  of  decision  maker  i  at  stage  k,  i=l,2.  and 
(  u>(k)t  k€0)  are  independent  Gaussian  random  vectors,  x^ 

«)(.)  -N(0,R). 


.si. 


At  each  stage  k,  theyfe^are  measurements  y^(k)€R  l,  1*1. 2, 
given  by 

y.(k)  =  H.x(k)  +  Vj(k)  (18) 

wbere  {  v^(k),  kG0j,  1=1,2)  are  independent  Gaussian  random  vectors 
vj(*)  -  N(0,E.).  v^s  are  also  independent  of  Xq  and  fo>(k),  k€0j1. 
The  information  available  to  the  decision  makers  is  not  y^(k)'s,  but 
*|(k),  x3 (k) ,  the  estimates  of  x(k)  given  by  two  Kalman  filters: 


£.(k)  =  ^(k/k-D  +  G^fy.OcJ-H^Oc/k-l))  (19a) 

^(k+l/k)  =  AJL(k)  +  B1u1(k)  +  B2u2(k)  ,  5^(0/- 1)  =  xQ  (19b) 

Ct(k)  =  r.(k/k-l)H«(HiIi(k/k-l)H'  +T.)*1  (19c) 

I.(k+l/k)  =  A[l-G.(k)Hi]Z.(k/k-l)Al  +R.  T.(0/-1)  =  QQ  (19d) 

Ijfr)  =  [1-  Gi(k)Hi]Xi(k/k-l)  (19e) 

i  =  1,3 
where 

H3  =  [H'j,  H'z]'  (20a) 

y3(*)  =  C  y\(  •  )•  y^(  - )]'  (zob) 

E3  =  diag  [E1»E2]  (20c) 


x^(k+l/k)  is  the  one-step  prediction  estimate  and  (k)  and  X^(k+l/k) 
are  the  error  covariance  matrices  associated  with  SL(k)  and  x.(k+l), 
respectively. 


t \ 


l  w,  r— 


I.(k)  *  E{  [*00  -  x.(k)][  x(k)  -  XjOc)]'}  (21) 

I  ^(k-fl/k)  *  E  (  [x(k+l)  -  i^(k+l/k)][x(k+l)  -  ^(k+l/k)]' }  (22) 


The  information  structure  is  defined  as  follows:  At  each  stage 
k,  decisionmaker  1  knows  Ij(k)  =  (x^(k), x^(k)]  while  decision 
maker  2  knows  ^(k)  =  (ij(k)).  This  information  structure  can  be 
justified  by  considering  that  there  are  two  impartial  referees  1  and 
3  who  compute  respectively  Xj(k)  and  x^(k)f  -referee  1  gives  x^(k)  to 
both  decision  makers  and  referee  3  gives  x^(k)  to  decision  maker  1 
only. 

The  cost  of  decision  maker  l  is  ( =  J.(0))  where  Jf.(k)  denotes 
the  cost  to  go  of  decision  maker  i  at  stage  k  and  is  defined  by 


J.(k)  *  E{  I  Jx'(n)P.x(n)+u'(n)u.(n)  +  u'(n)Q.u.(n)]  +  x'(N)Pix(N)}  (23) 

jM. 

where  P^,  Q.  >  0.  u.(k)  is  chosen  as  y^(I.(k))  and  the  y^'s  are 
measurable  functions,  yjS  RnxRn  -  R*1  and  y£:  Rn~Ri2  with  the 
property  that  y^(l.(k))  is  a  second  order  random  vector. 

Let 


*1  =  tff.Yi1 . .  1*1.2 


(24) 


*  * 


A  pair  {  g ^ ,  g^)  Is  called  a  Nash  solution  of  the  game  if 


Jl(gl*  g2) 

< 

Jj(gl*  g2) 

Y  admissible  gj 

(25a) 

Jz(g*.  g2> 

< 

J2<gl’  g2) 

Y  admissible  g^ 

(25b) 

10 


Before  we  give  the  Nash  eolation  of  the  game,  we  need  the  following 
lemma  which  ehowe  an  orthogonality  in  the  information  structure  and 
the  proof  is  given  in  Appendix  /I . 


Lemma  3. 1  (i)  E[x^(k)  |  Xj(k)]  =  Xj(k). 

Let  x^(k)  s*  Xj(k)  -  ij(k)  then 
(ii)  x4(k)  J.  Xj(k)  and 

(iii)  E[x4(k)]  =  0.  E[i4(k),x4(k)]s  rx(k)  -I3(k). 

Notice  that  by  Lemma  3. 1,  the  information  structure  I^(k)  can 
equivalently  be  considered  as  I^(k)  =  (  Xj(k),  x4(k)}  which  consists 
of  two  orthogonal  elements. 

The  Nash  solution  of  the  game  described  above  is  provided  in 
the  following  theorem,  the  proof  of  which  is  given  in  Appendix  g . 

Theorem  «■  3.  2.  Consider  the  equations*’  - 

L.(k)  =  P.+A'[(I+B1B'L1(k+l)+B2B'2L2(k+l))'1],[Li(k+l)+Li(k+l)BiB'L.(k+l) 
+  Ljfk+lJBjQ.  B!  L.(k+l)][l+B1B*L1(k+l)+B2B^L2(k+l)]'  1A,~ 

L.(N)  =  P..  j  i  i,  l,j  «  1,2  .  (26) 

which  evolve  backwards  in  time.  We  assume  the  inverse  of 
(I+BjBJLj(k)  +  B2Bl,L2(k))  exists  for  every  k€0  ,  then 

(i)  There  exists  a  unique  Nash  solution  to  the  game  which  is  the 
following: 

u*(k)  =  Njdjfk))  =  F^kJXjW  +  F^kJ^fk)  (27) 


ri 


(28) 


u2w  *  V2(I2W)  * 

where 

Fu(k)  =  -B'L1(k+l)[l+  BjB^fk+l)  +  B2B^L2(k+l)]-1A  (29) 

Fw(k)  =  -  B^Oc+ltf  I  +  BXB[  I^k+1)]" 1 A  (30) 

F2(k)  *  -B^L2(k+l)[I+B1B'Ll(k+l)  +  B2B^L2(k+l)r1A  (31) 

(ii)  The  cost  to  go  of  decision  maker  i  at  stage  k  is 

J.(k)  =  E[  £^001*00  x3(k)]  +K.(k)  (32) 

where 

K.(k)  =  tr{[A'Li(k+l)A-L.4(k)+Pi]Z3(k)-  L.Oc+D^tk+D-Rl+L^OcJ^k)]- 

(L*  ^ 

+  K.(k+1)  .  K.(N)  =  tr{p.Z3(N)}  (33) 

L14(k)  =  A'fd+B^'L^k+DJ-YtL^k+D+L^+DBjB^LjOc+l)] 

[I+B^L^+UI^A-  LjtkJ  +  Pj  (34) 

L24(k)  =  A'[(I+BlB'L1(k+l))"1]'[L2(k+l)+L1(k+l)B1Q2B'L1(k+l)] 

[I+B jB'jLij(k+l)]-  !A  -  L2(k)  +  P2  (35) 

Remark  3.  3.  Notice  that  the  control  laws  Fjj(k),  Fj4(k)  and 
F2(k)  in  the  above  theorem  are  independent  of  the  observation  noise 


in  the  meaiurementi  (18),  l.  e. ,  &  tort  of  separation  principle  holds 
under  such  information  structure.  Also  we  can  see  from  the  Kalman 
filter  equations  (19)  that  the  estimation  error  L^(k),  1*1#  3  i» 
independent  of  the  controls. 


Remark  3. 4.  Compare  now  Theorem  3.  2  (where 
Ij(k)  =  (x^(k),  Xj(k)},  I2(k)  =  (xj(k)))  with  Theorem  2.  l*(where 
Ij(k)  =  {ij(k)3,  I2(k)  =  C  (k)  ]  "and  let  M  =  2)  we  see  that  N^W). 
the  Nash  strategy  of  decision  maker  2  is  the  same  in  the  two  different 


information  structures. 


■M  =  2^  Motivated  by  Theorem  2.  T  and  2.1  where  we  see  that 
more  information  to  the  decision  maker  who  knows  all  his  opponent's 
information  is  beneficial  to  him,  we  expect  that  the  extra  information 
x.fk)  (I,(k)  compared  with  I-(k)  in  Theorem  3.2)  is  beneficial  to 

%ve 

decision  maker  1,  which  is  indeed  and  will  be  shown  in  the  following 
section. 

Remark  3.  5.  The  nonsingularity  condition  of  the  matrix 
1=  BjB'jL.j(k)  +  B2B^L2(k)  and  the  boundedness  condition  of  L.(k),  the 
solution  of  the  coupled  Riccati  equations  (31)  were  discussed  in 
Theorem  2.  ^and  Remark  2  [i]. 


fy  Some  Informational  Properties  of  LQC  Dynamic  Nash  Games 

In  this  section  we  first  give  the  definition  of  "better  information 
for  decision  maker  1  alone,"  then  compare  the  Nash  costs  of  both 


decision  makers  resulting  from  two  different  information  and  then 
prove  that  better  information  for  decision  maker  1  alone  is  beneficial 
to  him.  A  sufficient  condition  that  better  Information  for  decision 
maker  1  alone  is  beneficial  to  decision  maker  2  is  also  derived. 

Consider  Information  I  and  Q.  In  Information  I  the  estimates 
xX(k)  and  x^(k)  are  generated  through  the  past  controls  and  the 
measurements 

y  X(  • )  *  h[x(.)  +  v[(.)  ,  ▼[  -  N(O.lJ)  (36) 

1*1,2, 

with  corresponding  estimation  error  £  |(k)  and  E^(k).  In  Information 
Q  the  estimates  x^(k)  and  i^(k)  are  generated  through  the  past 
controls  and  the  measurements 

X?(-)  *  H°x(.)  +  v[X(.)  ,  vf-N^E?1)  (37) 

Is  1,2,  . 

with  corresponding  estimation  error  E^X(k)  and  £^(k). 

Definition  4.1.  We  say  that  Information  I  provides  better  informa¬ 
tion  for  decision  maker  1  alone  than  Information  II  if  £X(k)  *  E^(k), 
l|(k)  <  £^(k)  for  every  k€0j  and  E^(k)  4  E^(k)  for  at  least  one  k€9^. 

An  obvious  fact  about  the  definition  given  above  is  that  all  the 
improvement  is  in  the  part  of  x^(  •  ),  decision  maker  l's  private 
information  while  there  is  no  improvement  in  the  part  of  x^(  • ),  the 
public  information  of  both  decision  makers. 

Let  J?(k)  and  K?(k),  1*1,2,  be  defined  as  in  (32)  and  (33) 


r.  v*r.i  .T*T 


■  * 


corresponding  to  Information  I  and 

nj(k)  *  E[xj(k)  x*'(k)]  .  J-1.3. 

Similarly  we  define  J°(k),  K^OO.  l«l,  2  and  n“(k).  j  *  1.  3  for 
Information  Q. 

y 

Theorem  •  4.3  The  Nash  solution  given  by  Theorem  4.  3.  2  has  the 
property  that  better  information  for  decision  maker  1  alone  does  not 
increase  decision  maker  i's  cost  If 


P.+ A'L.(k+l)A-L.(k) -L^^(k)  >  0  for  every  k€  8  ^ 


It  lowers  decision  maker  i's  cost  with  strict  inequality  in  (38). 

Proof:  From  part  (ii)  of  Theorem  3.  2, 

jj(0)  *  E[^,(0)L.(0)x^(0)]  +  K?(0)  *  tr(L.(0)  fi^O)}  +  k[(0)  .  (39 

From  the  recursive  expression  of  K^(  *  )  in  (33)  we  obtain 

J|(0)  *  tr  {u(0)n^(0)  +  [Pi+A'Li(l)A-Li4(0)]Z^(0)+Li(l)R+Li4(0)II1(0)  + 

N-U  I  I  -I  -v 

I  [[P.+A' L.(k+1)A- L.(k)-  L.4(k)] r3(k)  +  L.(k+1)R  +  L-4(k)  Ej(k)J  j 

k=l  /in 


Similarly 


j“(0) » tr  {Ll(0)n°(0)+[Pl+A*L.(l)A-Ll4(0)]  E^OJ+L-tDR+L^OE^O)  + 


N-l 

E 

k=l 


[[P^A'L^+DA-  L.(k)-  L.4(k)]  ^(k)+L.(k+l)R+Li4(k)l{(k)]| 


(41) 


By  using  the  feet  that 


ci°(0)  -  n*  (0)  *  -a°(0)  - 1^(0))  (42) 

we  obtain 

N-l 

j“(0) -  j[(0)  a^E^tr {(Pi+A'L.(k+l)A-Li(k)-Li4(k)][ E°(k) -E* (k)) 

_  +  Ll4(k)[E°(k)-E}(k)]} _ (43) 


Suppose  now  that  Information  I  provides  better  information  for 
decision  maker  1  alone  than  Information*II,  then  Lemma  4-  .  2 
implies  J°(0)  >  j[(0)  if 


Pj  + A'L.(k+l)A- ^(k)- L.4(k)  >  0  for  every  k£  ,  (44) 


and  J?(0)  >  jj(0)  if  the  inequality  is  strict  in  (44). 

Corollary  4.  4.  Better  information  for  decision  maker  1  alone 
does  not  increase  decision  maker  l's  Nash  cost.  It  lowers  decision 
maker  l's  Nash  cost  provided  that  the  matrices  A,  BjB^  and  Pj  are 
nonsingular. 

Proof:  Substituting  (26)  and  (34)  into  (44),  we  obtain 


P^A’LjOc+DA-  LjOc)-  Ll4(k) 


*  A'Ljtk+DA-A'fa+B^^Ljfk+l))"  ^'[LjOc+lJ+Ljtk+lJBjBJL^k+l)] 

[I+B^'LjOc+I)]*^ 

*  U'VU  >  0  (45) 

where 

U  *  [1  +  BjB'jLjCk+lfl^A  (46) 

end 

V  =  Ljfr+DB^'LjOc+l)  +  Ljtk+DBjB'j^fr+DBjB'jL^k+l)  >  0  (47) 

Furthermore,  If  Pj  >  0  then  (26)  implies  that  Lj(k)  >  0,  hence  U  is 
nonsingular  and  V  >  0  provided  that  A,  BjB'j  and  Pj  are  nonsingular. 
Theorem  4.  3.  then  implies  the  desirecfresult. 

Remark  4.  fc  Notice  the  resemblance  of  equation  (47)  to  (3d)  of 
[»}  .  This  is  so  since  x^(«  )  is  orthogonal  to  decision  maker 

2's  information,  any  improvement  in  the  part  of  x^(  • )  is  totally  used 
by  decision  maker  1  to  optimize  his  performance  which  brings  forth 
the  team*  like  benefit. 

Remark  4.  6  In  Corollary  4.  4  we  see  that  better  information 
for  decision  maker  1  alone  is  beneficial  to  him  and  this  fact  is 
independent  of  the  number  of  stages  N  and  it  is  not  necessary  for 
the  "better"  information  to  be  -"dynamically  better.  "  In  contrast 
with  Theorem  the  above  two  features  reveal  the  essential 


difference  between  improving  the  decision  makers'  "private" 
information  and  "public"  information  in  a  dynamic  Nash  game. 


1  Related  Properties  of  Static  and  Feedback  Stackelberg  Games 

In  this  section  we  extend  the  results  obtained  in  Nash  games  to 
static  and  feedback  Stackelberg  games.  The  difference  of  a 
Stackelberg  game  and  a  Nash  game  lies  partially  in  that  the  roles  of 
die  decision  makers  are  asymmetric  in  Stackelberg  games  while  it 
Is  symmetric  in  a  Nash  game.  However,  the  Stackelberg  solution 
of  a  static  game  is  also  a  Nash  solution  of  the  same  problem  under 
explicit  control  sharing  and  a  feedback  Stackelberg  solution  of  an 
N-stage  dynamic  game  is  also  a  Nash  solution  of  a  2N- stage  game 
(as  has  been  observed  in  [  6  ]).  Hence  we  expect  some  different  as 
well  as  some  similar  properties^f^Stackelberg  Nash 

games. 

Consider  a  two- decision- maker  static  Stackelberg  game.  Let 
decision  maker  1  be  the  leader  and  decision  maker  2  the  follower. 
Their  cost  functionals  are  given  by  Y2)  an<*  **2^1r^2^' 

respectively,  where 


J.(Yi*  =  e[ -ruIu.+-|u!P.u.+  u!Q.u.i 
i  1  2'  L2  i  i  2  j  i  j  i  i  j 


u.'S..x  +  u’S.  .x  J 
i  u  J  ij  J 


(48) 


jj«i,  i,  j  =  1,2. 

n  h 

where  x€R  is  a  Gaussian  random  vector,  x  -  N(0,H),  u^€R  is  the 

control  variable  of  decision  maker  i  and  P.,  Q. ,  S..  and  S. .  are  real 

i  i  n  ij 

constant  matrices  of  appropriate  dimensions.  The  linear 


»ara  w  r»  v»  -■  [WM/v  nasi 


measurement  of  decision  maker  i  is  given  by 


yi  *  Hlx+  *|  *  (**> 

Is  an  m|Xn  real  constant  matrix  and  u>^  Is  a  Gaussian  random 

vector,  u>.  -  N(0,Ej)  which  is  independent  of  x.  The  control  law  y^ 

is  chosen  from  where  I*.  consists  of  all  the  measurable  functions 

f ; 

mapping  from  R  to  R  such  that  y.(y^)  !■  a  second  order  random 
vector.  A  pair  (y*,  y*)  called  a  Stackelberg  solution  with  decision 
maker  1  as  the  leader  if  y*  satisfies  the  following  inequality 


sup  J1(yl,  y2)  <  sup  j1(Y1.Y2)  (5C 

Y2€R2(yJ')  Y2€R2(y1) 

for  every  Y|€Tj  and  y2€R2(Yj),  where  r2(Yj)  i*  called  the  rational 
reaction  set  of  the  follower  to  the  strategy  y^  announced  by  the 
leader,  and  it  is  defined  by 

R2^1^  =  {.y2^**2t  J2^1*  =  J2^yl'y2^  *  ^*2^2}  ^51 

Notice  that  if  R2(Yj)  is  a  singleton  for  each  y^CT^,  then  (50)  can 
equivalently  be  written  as 

J1(yJ.Y2(y*»  <  Jj^.yJ^M  •  (52 

It  turns  out  that  R2(Yj)  is  a  singleton  indeed  [“f  ]  and  ts  given  by 
Y2(Yry2)  s  -S22E[x|y2]  -  Q2E[y1(y1)|y2]  . 


•«-* <-  fr.#  «?.  jm.m  - 


.*/  Y,  a-u-Sf. 


(53) 


A  sufficient  condition  that  a  unique  linear  Stackelberg  solution 
exists  was  given  in  [  f  }  which  condition  is  determined  by  the 
matrices  and  Q^,  1=1,2,  and  has  nothing  to  do  with  the  informa¬ 
tion  available  to  the  decision  makers.  We  assume,  in  the  following 
derivations  that  a  unique  linear  Stackelberg  solution  exists  under 
every  information  we  will  consider.  The  result  of  the  following 
lemma  is  known  but  we  include  a  short  proof  for  reasons  of 
completeness. 

Lemma  5. 1.  The  leader's  cost  decreases  if  he  has  an  extra 
measurement  y  available. 

Proof:  Let  (y*,  an<*  (Yj#Y2)  denote  respectively  the  Stackelberg 
solution  before  and  after  the  leader  acquires  y^.  After  the  leader 
acquires  ye,  he  can  choose  a  suboptimal  strategy  Yj(yj»ye)  85  Yjfrj), 

s  ♦ 

then  the  follower  will  react  by  choosing  V2(y2)  *  Y2(y2)  and  hence 

Ji(vi(yi*ye)'y2(y2)  <  Ji<Y?(yrye>.Y!(y2> 

=  J1(Yj(y1).  Yj(y2))  (54; 

□ 

The  follower,  who  is  in  the  lower  level  of  a  hierarchy,  see 
things  different  from  the  leader  and  knowing  more  is  not  necessarily 
beneficial  to  him.  As  in  the  Nash  case,  we  first  prove  in  the 
following  theorem  that  if  the  follower  acquires  extra  measurement 
yft  which  satisfies  certain  orthogonality  conditions  or  the  follower 
knows  all  that  the  leader  knows,  then  such  y^  is  beneficial  to  the 
follower. 


%o 

••«...  —  — - - - - 


Condition  ^  (i)  ye  JL  *  MY2 

Theorem  5,2.  If  the  follower  acquiree  extra  measurement  y#  such 

that  either  one  of  Condition  <T  holds,  then  the  leader's  strategy 
does  not  change. 

Proof:  Let  YjtYj)  denote  the  leader's  strategy  before  and 

after  the  follower  acquires  y@  and  Y2(Yj»y2)»  Y2^Yl,y2,ye^  denote 
respectively  the  follower's  reaction  before  and  after  he  acquires  ye» 
then  by  (53) 


and 


Y 2(YrY2>  =  -S22E^x1y25“  Q2El  ^1*  W 


Y^Y^Yz.Ye)  =  -S22Efxly2*yeJ~  Q2E[Yl(yl)  i^e1 


(55) 


(56) 


Under  either  one  of  Conditions  C  thfc  following  is  true 

E[Yi<yiMy2*yeJ  *  EfYl(y1)|y2l  •  (57> 

Hence  (56)  can  be  written  as 

Y2(Yl’y2'ye)  =  yJ(Yryz) -S22{E[x|y2,ye]-E[*ly2]} 

=  Y2(Yry2)  -  S22y  (58) 

where  y  -  E[x|y2>ye]  -  E[x|y2],  which  by  Lemma  2 .If-  is  orthogonal 

to  y^  and  y2>  The  leader's  strategy  after  the  follower  acquires  yfi  is 
the  following  (we  omit  the  arguments  in  the  strategies  \*(« )  and 


Y^(  • )  for  a  while  to  avoid  the  tedious  expressions): 


Y1  =arf  Hjy1-  E  {  2Y1Y1+TY2  PlY2+YiQlY2+YlSllX+ Y2  S12x} 

Yl'yl,ferl 

*  ar/g  E(iwiY2  P1Y2"Y2  PlS22y  +  i(S22y,,Pl<S22y)+YiQlY2 

Yl(yl>€ri 

*YlQiS22y  +  YiSllx+Y2  S12x”  *S22y*'Sl2x} 

=  ar*“iS  E  {  i  YiYl+i  Y2PlY2  +  Y'lQlY2  +  YlSllx+  Y2S12X} 

Yllyl,fell 


(59) 


where  we  use  the  orthogonality  conditions  to  get  rid  of  the  terms 
•y,,  P1S22y  and  ”YjQjS22y  in  taking  the  expectation  operations. 


□ 


Theorem  5.  ?  If  the  follower  acquires  extra  measurement  y^ 
such  that  either  one  of  Conditions  C  holds,  then  the  follower  can 
do  better  by  incurring  lower  cost. 


Proof:  The  proof  is  similar  to  Theorem  2.  6  and  hence  omitted. 


Now  consider  a  feedback  Stackelberg  game  with  the  same 
formulation  as  in  the  feedback  Nash  game  of  Section  TSL  except  we 
consider  two  cases  which  correspond  to  two  different  information 
structures.  Let  I.(k)  denote  the  information  available  to  decision 
maker  i  at  stage  k,  then 

Case  A:  I*(k)  =  f j^fk),  x3(k)),  l£(k)  =  {xj(k))  . 

Case  B:  if  (k)  *  (x^k))  .  I®(k)  =  (x^k),  *3(k))  . 


i 


Let  us  call  decision  maker  i  the  leader  and  2  the  follower.  A  pair 
(g*,  g*)  is  a  feedback  Stackelberg  solution  to  the  game  if 


sup 

vX<*n> 


r  t  *  w  k. 

J1  gl*  g2k*  V 


sup  Vj.  g2k*  Vj) 


V  admissible  Yj»  Where 


0  .1 


*ik 


=  Cvy 


J-l 

H  1 


k+1 


N-l, 


1c 

R^(Yj  )  is  called  the  rational  reaction  set  of  the  follower  at  stage  k 
to  the  strategy  Yj  announced  by  the  leader  and  is  defined  by 


The  feedback  Stackelberg  solution  for  Cases  A  and  B  are  provided  in 
Appendix  C 

Let  Information  I  and  II  be  defined  as  in  Section  TSL  and  satisfy 
the  condition  in  Definition  4. 1,  then  in  Case  A  Information  I 
provides  better  information  for  the  leader  alone  than  Information  II 
while  in  Case  B  Information  I  provides  better  information  for  the 
follower  alone  than  Information  II.  We  have  the  following  theorem. 

Theorem  •  5.  4.  Under  the  information  structure  of  Cases  A  and  B, 


the  feedback  Stackelberg  solution  has  the  following  properties: 


(i)  Better  information  for  the  leader  alone  is  beneficial 


to  the  leader. 

(ii)  Better  information  for  the  follower  alone  is  beneficial 
to  the  follower. 

Proof:  One  way  of  proving  this  theorem  is  by  using  the  connection 
of  the  feedback  Stackelberg  solution  to  the  feedback  Nash  solution 
according  to  the  procedure  of  [  i  ]  where  it  was  proved  that  a  feed¬ 
back  Stackelberg  solution  of  an  N-stage  dynamic  game  is  also  a 
feedback  Nash  solution  of  a  2N- stage  dynamic  game  and  the  result 
is  then  implied  by  Corollary  4.  4-  An  independent  proof  of  this 
theorem  is  provided  in  Appendix  P. 

Remark  5.  A  similar  feedback  Stackelberg  game  was  studied  in 
[8]  where  the  expressions  of  the  solution  obtained  were  so  compli¬ 
cated  that  it  was  not  possible  to  investigate  its  informational 
properties.  The  expressions  of  the  solution  could  have  been 
simplified  if  the  authors  of  [  B  ]  had  observed  the  orthogonality 
condition  in  die  information  structure,  i.  e. ,  Lemma  3.  1  (ii)  of 
this 

Examples 

Example _ 1_  This  example  Illustrates  Theorem  2.  T  and  2.  ( 

under  Condition  C  (i).  Consider  a  static  Nash  game  where  all 
the  notations  follow  those  defined  in  Section  X 


J1(YrY2)  *  E[(x  +  Ul+u2)2  +  uJ] 

J2^Y1'  y2*  *  EKX  +  ui+  u2^  +  u2  1 

Decision  maker  i  has  measurement  y.,  y^  =  x  +  u>..  x,  u>j  and  u>2 
are  independent  random  variables  with  zero  mean  and  unit  variance. 

This  example  was  previously  considered  in  [ij  and  the  Nash 

*  1  *  1 
solution  was  given  by  Yj(yj)  =  “  5  yj-*14*!  Y2(y2)  =  *  5  y 2 

corresponding  Nash  costs  J^Yj.  Y2)  s  J2*Y1#  Y2*  *  900*  Now  **  in 

addition  to  y2»  decision  maker  2  acquires  extra  measurement  yfi, 

what  is  the  impact  to  his  Nash  cost?  It  was  shown  (Case  B  of  [2.]) 

that  if  y  =  y.  then  decision  maker  2  incurs  higher  Nash  cost.  In 

the  following  we  will  find  a  y^  such  that  ye  J.  {  anc*  demon* 

strate  that  this  y^  will  lower  decision  maker  2's  Nash  cost. 

Let  ye  =  x-o)j-  u)  2»  then  it  is  easy  to  check  that  ye  X  (  y^,  y2)* 

Denote  the  Nash  solution  after  decision  maker  2  acquires  this  y  by 

(Yj,  y2)>  then  by  direct  calculation  we  obtain 

0.  .  1 
Y 1  (y  1  >  =  -  sYj 

and 

0.  .  1  1 
''2('r2'y.)  *  -5>r2-,6>,« 

The* corresponding  Nash  solution  of  decision  maker  2  is 
j2(yJ.  Y2) 


j  /v°  0)  -  ill  <  1^8  _  j  ,*  *, 
J2'Y1'  y2*  900  <  900  "  J2lYl*Y2; 


Example 


2  This  example  illustrates  Corollary  4.3. 


Consider  a  dynamic  Nash  game  with  the  general  formulation  given 
in  Section  V-  and  1L  We  choose  A  =  0.  5,  Xq  =  0,  Oq  a  10, 

=  R  =  1,  Qj  s  20,  i  =  1,2.  Two  kinds  of  information,  I  and  II 
are  described  below: 

Information  I,  £*(•),  x^(  •  )  are  corresponding  to 

y*(  • )  s  *(  • )  +  v  . ) 

vf(  •  )  ~  N(0,  1),  1-1,2. 

y\  (*)**(•)  +  v2<  * ) 

Information  II,  x^(  •  ),  i^(  •  )  are  corresponding  to 

y°(.)  =  x(.)  +  v°(  -  ) 

vf(.>- N(0,1),  i~  1,2  . 

y^(- )  ~  o*x{- )+v^(. ) 

It  is  easy  to  see  that  for  Information  n,  x^(k)  =  x^(k)  at  every  stage 
k  and  Information  I  provides  better  information  for  decision  maker  1 
alone  than  Information  II.  We  compute  the  Nash  cost  of  decision 
maker  1  for  different  number  of  stages,  i.  e. ,  N  from  1  to  19.  The 
resulting  costs  are  shown  in  Table  6.  | .  Notice  that  Information  I 
is  more  beneficial  to  decision  maker  1  than  Information  n.  Two 
features  of  this  fact  are:  first,,  it  is  independent  of  N,  the  number 


of  stages  and  second,  since  A  =  0.  5,  x^(  •  )  is  not  dynamically  better 
than  • ). 


Information  Z 


Information  I 


Benefit  of  Decision 
Maker  1  Due  to 
Better  Information 
for  Him  Alone 


N*  1 

16. 72872 

16.98826 

0.259544 

N=  2 

19.  79963 

20. 12271 

0. 323073 

Ns  3 

21.68059 

22. 06824 

0. 387644 

Ns  4 

23. 31423 

23. 76004 

0.445805 

Ns  5 

24. 90147 

25.40363 

0.502162 

Ns  6 

26.48017 

27.03831 

0.558140 

Ns  7 

28.  05730 

28.  67135 

0.614047 

Ns  B 

29.63415 

30. 30409 

0. 669940 

Ns  9 

31.21094 

31.93677 

0. 725830 

N=10 

32. 78773 

33.  56945 

0.  781720 

N=ll 

34. 36451 

35.20212 

0. 837610 

N=12 

35.94123 

36. 83479 

0.  893500 

N=13 

37.51808 

38.46747 

0.  949390 

N=14 

39. 09486 

40. 10014 

1.005280 

N=15 

40.67164 

41.73281 

1.061170 

N=16 

42.  24843 

43.  36549 

1.  117060 

N=17 

43.  82521 

44.99816 

1.172950 

N=18 

45.40199 

46.63083 

1.  228840 

N=19 

46.  97877 

48.  26350 

1.  284730 

Table 

6.  1.  Costs  of  decision  maker  1  In  Example 

2  under 

different  information  versus  different  number  of 
stages. 


T (SL  Conclusion 

In  a  general  two-decision-maker  LQG  Nash  game  (static  or 
dynamic)  we  proved  that  more  or  better  information  for  one  of  the 
decision  makers  alone  is  beneficial  to  him  if  he  is  informationally 
stronger  than  his  opponent,  i.  e. ,  he  knows  all  his  opponent's 
information.  In  a  static  game,  more  information  to  one  of  the 
decision  makers  alone  is  beneficial  to  him  if  such  information  is 
orthogonal  to  both  decision  makers'  information.  Such  results  are 


n 


»«.  — W  1  -mi  ,  .  — f  — .  ■  , 


quite  understandable.  Since  Nash  solution  is  an  equilibrium  solution 
with  consistency  constraint  [  f  ],  any  unilateral  improvement  of 
information  does  not  guarantee  benefit  to  either  party.  A  unilateral 
improvement  of  information,  however,  does  guarantee  benefit  to  tXc 
who  has  the  improvement,  if  his  opponent's  strategy  does  not 
change  by  such  improvement  such  that  he  who  has  the  improved 
information  can  use  it  to  optimize  his  strategy  without  constraint. 

In  order  that  his  opponent's  strategy  does  not  change,  his  opponent 
should  be  totally  ignorant  of  this  improved  information  and  which  is 
implied  by  the  orthogonality  condition  given  by  Lemma  2.4-. 

Similar  results  hold  in  static  and  feedback  Stackelberg  games  for 
both  the  leader  and  the  follower.  The  leader  in  a  static  Stackelberg 
game,  however,  can  use  any  extra  information  to  his  benefit. 

As  we  noted  before,  the  investigation  of  the  informational 
property  of  the  dynamic  Nash  game  is  greatly  simplified  by  the 
formulation  of  the  game  where  a  sort  of  separation  principle  holds 
and  the  estimation  error  is  independent  of  the  controls.  Without 
these  nice  properties,  it  will  be  difficult  either  in  defining  "better 
information  for  one  decision  maker  alone"  or  in  solving  for  the  Nash 
solution.  Either  one  of  the  difficulties  makes  the  problem  extremely 
hard. 

An  extension  of  the  results  obtained  in  this  to  N- 

decis ion-maker  Nash  game  is  straight  forward  and  such  results 
constitute  a  fundamental  step  in  designing  information  structure 
[**,  II  .  I*-]  for  large  scale  systems. 


APPENDIX  ^ 

Proof  of  Lemma.  3. 1. 

Consider  the  following  state  equation  and  measurements: 
x(k+l)  *  A£(k)  +  u>(k)  ,  5(0)  =  Xq  (Al) 

y(k)  *  HLx(k)  +  v.(k)  ,  i*  1,2.  (A  2) 

where  xQ,  [u>(k)}  and  f  v.(k)}  are  defined  as  in  Section  HL  ~  By 
comparing  (Al)  with  (17)  we  immediately  have 

k-l  ,  n  . 

x(k)  =  SW+T  A^^BjUjW  +  B^n)]  (A3) 

Let 

5j(k)  =  E[x(k)Jy1(0), ...  ,yj(k)]  (A4) 

and 

X3(k)  =  E[5(k)jy1(0),....SF1(k),y2(0) . y2(k))  (^5) 

then  5.(k),  i:  1,3  are  given  exactly  by  the  Kalman  filter  equations 
(19)  except  that  (19b)  is  replaced  by 


^  (A  K-it  i 

14  gn^|  X# 


Xjlk)  *  Sl(k)  +  cpj^ 


where 


«  I  A 


k-n- 1 


[BjUjtn)  +  B2u2(n)] 


Since  x^(k)  is  a  refinement  of  x^fk),  we  obtain 

ix(k)  =  E[  x(k)  i  y  ^0), ....  ^(k)] 

=  E[E[  x{k)  |  y^O), ....  y^(k),  y2(0) . y2(k)]  \  y^O), ....  yjtk)] 

*  EfS^k)}  .  . .  .yjtk)]  '  -  ( 


Hence 


E[  i3(k)  I  ^(k)]  *  E[E[  S3(k)  I  y^O) . y  jOO]  ji^k)] 

=  E[ij(k)|  ix(k)]  =  kjtk)  . 


(£7)  indicates  that 


E[x3(k)j  ^(k)]  =  EfXjOcJ  +  ^IXjOcJ  +  q^] 

=  E[x3(k)J  Xj(k)]  +  =  Xj(k)  +  q>k 

=  Xj(k)  ^ 

By  the  projection  theorem  [#.],.  x3(k)  -  E[x3(k)  |Xj(k)]  is  of  zero 
mean  and  orthogonal  to  x^(k),  i.  e. ,  E[x^(k)]  =  0  and  x^(k)  J.  *j00* 
Finally 


E[xj(k)  x^(k)J  =  EUijOcJ  +  ^OcJKicjW  +  i^k))'] 

=  EfijOc)  ^(k)]  +  E[i4(k)  x;(k)] 


E[x4(k)  i'4(k)] 


E[x3(k)  x^(k)]  -  EfXjW  i\(k)} 


=  ^(k)  -  r3(k) 


APPENDIX  B 

In  this  appendix  we  prove  Theorem  3.  2.  The  proof  Is 
similar  to  that  of  Theorem  4.1  Since  the  Nash  solution  g*  of 
decision  maker  i  is  a  solution  of  the  optimal  control  problem  where 
the  decision  maker  j#  j  4  i  fixes  his  strategy  at  g*,  we  can  solve 
the  problem  by  dynamic  programming.  Recall  that  JL(k)  denotes 
the  cost  to  go  of  decision  maker  i  at  stage  k. 

At  stage  N, 

J.(N)  =  E[x'(N)PjX(N)]  =  E[xj(N)P.Xj(N)J[  +  tr[P.r3(N)] 

=  E[x^(N)L.(N)i3(N)]  +  Kl(N)  (gl) 

i  =  1.2 

where  . 

L.(N)  9  K.(N)  $  tr  [P.E3{N)]  . 

At  stage  N-l  .  .  ... 

J^N-1)  =  E[x’(N-l)P.x(N-l)  +  u!(N-1)u^(N-1)  +  uj(N-l)Q.u.(N-l)  + 

xfNJPjXfN)]  j*i,  i,j  *  1,2  .  (B2) 

After  receiving  I.(N-l)f  decision  maker  i's  objective  is  to  minimize 
J.(N-1)  given  by 


>«•  »*» 


J|(n-1)  =  Etx'tN-lJP^N-lJ  +  uJfN-lJujtN-D  +  ultN-lJQ.u^-l) 

+  x'(N)Pix(N)|l.(N-l)]  ( 

By  applying  the  Kalman  filter  equations  (19)  and  Lemma  4.  3. 1  we 
obtain 


JjCN-l)  =  u^(N  -  l)Uj(N  -1)  +u^(N- 1)0^2  (N  - 1)  +  (Ax^  (N  - 1)  +  - 1) 

+  B^tN-DJ'I^tNJtAijtN-D  +  B^tN-D  +  B^tN-l)) 

+  ^(N-DPji^N-l)  +  trfP^fN-l)  +  L1(N)[I3(N/N-1)-E3(N)]] 


+  KX(N) 


(64) 


J2(N-1)  =  E[(Ai3(N-l)+B1u1(N-l)+B2u2(N-l))'L2(N)(Ax3(N-l) 

•»  t 

+  BjU^-D+B^^-l))^^-!)] 

+  E[u^(N-l)Q2ul(N-l)|I2(N-l)]+  ^(N-lJP^N-l) 

+  trtP^tN-lJ  +  ^fNtfE^/N-l)-!^)]!  +  K2(N) 


Since  J.(N-l)  is  convex  in  u^(N-l),  the  Nash  pair  at  stage  N-l, 
*  N-l 

y2  is  chosen  such  that 


^(N-IJI 


=  0  ,  i  =  1,2 


*  N-l  *  N-l 

Yl  .  yz 


We  then  have 


*y[,'I(I1(N-D)  =■  -tl+B;L1(N)BirlBlL1(N)[Ai3(N.l)+B2*Y^-1(I2(N-l))) 

(87) 

N? '  V2(N- 7))  -  -  [I+B^L2(N)B2r  ,B^L2(N)[a*1(N-1)+B1E[*yJ,‘  ‘(IjIN-I)) 

I  I2(U-1)J]  (8  8) 

From  (B7)  and  by  Lemma  3. 1  (i)  we  obtain 

» 

E[*',r  i(Il(N+1)  I  I2tN'1)l  *  -U+BiL1(N)B1]‘1B;L1(N)[Ai1(N-l) 

+  B2*y”-1(I2(N-1))]  (j,) 

Substituting  (89)  into  (0  8), 

*Yz'2(i2(N-1))  =  -IX+B^L2(N)B2]‘1B^L2CN)[Ax1(N.l)-B1(I+BjLl(N)Bl)'1 

BJLjfNHA^N- 1)  +  B2*y2 'l(I2(N- 1))]  (BIO) 

By  applying  the  following  formula  (&11)  several  times  we  obtain 

(012). 

Z1<I+Z2Z1>  s  <I  +  ZlZ2>”lzi  (0 11> 

*y2~l(lz(N-\))  s  -B^L2(N)[MjB'L1(N)+  B2B^L2(N)r1Ax1(N-l) 

*  FgfN-lJx^N-l)  (p  12) 

where 

F2(N-1)  i  -B^L2(N)[I+B1B'L1(N)+  BjjB^tN)]-^  (013) 


3? 


Substituting  (112)  Into  (g7)  we  obtain 


+  B2B^L2(N)]-1Ai1{N-l) 

=  FjjCN-lJXjtN-l)  +F14(N-i)x4(N-l)  ($14) 

where 

-  F  ijCN-1)  =  -BJLjfNJfl+BjB'LjtNJ  +  B^LjjtN)]’^  ($15) 

and 

F14CN-1)  =  -B'LjtNJtl+BjB'LjtN)]”^  16) 

Notice  that  ^ given  by  (pl2)  and  (&14)  exists  and  is 

unique  if  [I  +  BjBjL^(N)  +  B2B^L.2(N)]  is  nonsingular. 

Substituting  ($12)  and  ($14)  into  ($2)  we  obtain 

J.(N-1)  =  E[x^(N-l)L.(N-l)x3(N-l)]  +  K.(N-l)  -  .  ($  17) 

where  L^(N-1)  and  K^(N-l)  are  given  by  (26)  and  (33)  respectively. 

As  we  can  see,  ($17)  and  ($1)  are  of  the  same  form.  In  deriving 
the  Nash  pair  stage  N-2,  we  will  repeat  what  we 

did  at  stage  N-l.  An  inductive  argument  then  proves  the  theorem. 


^5 


APPENDIX  6 


In  this  appendix  we  derive  the  feedback  Stackelberg  solution, 
the  problem  was  stated  in  Section  H  ^ 

Theorem  C  :  There  exists  a  unique  solution  to  the 'feedback 
Stackelberg  game,  (i)  the  solution  for  Case  A  is 

“lA00  "  =  FIlA(k,il(k)  +  FI4Afk,i4(k)  (Cl») 

“ZaM  *  ^ZA*1^”  *  FZA<k) ( |  I*  (k)J)  (C  lb) 

=  ^21A^*l^) 

where 


*|1Afr>  *  -B;ZA(k+l)[l+BlB*ZA(k+l)]’lA  (c.2) 

F14AW  *  -BlLiA^k+1)tl+B1BiL1A(k+l)rlA  (C3) 

F2A(k)  s  -B2L2A(k+l)fI+B2B2L2A(k+1)^1  *  (C4> 


F2lA(k)  *  ’B2L2A(k'fl)Jl','B2B2L2A(k+l)3’1fI+BIBiZA{k+1)1’lA  <C5) 

ZA(k)  =  U+BjB^Wr'li^WBjQ^L^W+I^tkn 


L1A^  a  Pl+FilA^FllA^  +  F21A^QlF21A*k*  + 


(A+BlFllAW+B2F2IA(k),,LlA(k+1)(A+BlFllA(k)+B2F2IA{k))* 

L1A(N)  =  P1  *  (C7) 

L2AW  =  P2+F,llA(k>Q2FllA<k>+F2lA(k)F2lA<k>  + 

(A+BiFiiA(k)+B2F2jACk)),L2A(k+l)(A+BiF1iA(k)+B2F21A(k)), 

1*2^ (N)  =  P2  .  (c8) 


Their  costs  to  go  at  stage  k  are  respectively 


JIAW  =  EC  XjOOLjaOOXjM}  +  K1A(k) 

(c9) 

J2A(k)  *  E(x^(k)L2A(k)x3(k)}  +K2A(k) 

**  r 

(cio; 

where 

KlA(k)  =  trffp1+A'L1A(k+l)A-L14A(k)]r3(k)  -L1Atk+l)r3(k+l)  + 

Li4A(k)5:1(k)+LlA(k+l)R}  +  KiA(k+l)f  K^N)  =  trfP^fN)}. 

(Cll] 

K2A(k)  =  tr^P2+A,L2A(k+1)A"L24A(k)32:3(k,':L2A{k+1)i:3(k+1,+ 

L24A(k)El{k)+L2A(k+i)R)+K2A(k+l),  K2A(N)  =trfP2I3(N)}. 

(C12) 

L14A(k)  *  Pl+F14A(k)F14Afk)+(A+BlFl4A(k))’LlA(k+1)(A+BlF14A(k))‘LlA(k) 

(c  13) 


K~  » 


L24A(k)  =  VF'l4A<k)Q2F14A<k|+<A+BlF14A<k>l'L2A<ktl)<A+ElF14A<k)-I-2ikl 

(C14) 

(ii)  The  solution  for  Case  B  is 

u*B(k)  *  %jB(I®(k))  =  Fj^OcJijfk)  (C  15a) 

^B(k)  -  =  F2B^fA*3^+Bi*YlB^f^^ 

*  F21B(k)i1(k)-»-F24B(k)^4(k)  lei 5b) 

where 

FUB(k)  «  -BJZB(k+l)[I  +  BjBJZgtk+lir'A  (c  16) 

F2B(k»  *  -BiL2Btk+1,[I+B2B2L2B(k+1»l'1  <C17> 

•  «*  r  _J 

F2lB(k)  =  -B^L2B(k+l)[l+B2B^L2B(k+l)r1[l  +  B1B'1ZB(k+l)]"lA  (tl8) 

r24B(k)  =  "B2L2B(k+1)tl+B2B2L2B(k+l)3’lA  '  ^l9) 

ZB(k)  =  [l+B2B^L2B(k)]-l,[L2B(k)B2Q1B^L2B(k)+L1B(k)] 

[l+B2B^L2B(k)]~ 1  ,  (C20) 

=  Pl+F^1B(k)F11B(k)+F^1B(k)Q1F2lB(k)  + 
<A+BlFllB(k)+B2F2lB(k)),LlB(k+1)(A+BlFllB,k)+E2F21B(k))-  L1B<N>=:P1  • 


(C22) 


L2B^  *  P2+FilB^k^Q2FilB^+F21B^F21B^k^  + 
(A+BjFjjglkJ+B^^^JJ'L^lk+lXA+B  lFUB(k>+B2F2lB<k»  • 

LjgCN)  .  P2  . 

Their  costa  to  go  at  stage  k  are  respectively 

J1B(k)  *  E{  x^(k)L1B(k)x3(k)}  +  K1B(k)  (C23) 

J2fi(k)  =  Ef  x^(k)L2B(k)x3(k)]  +  K2B(k)  (024) 

where 

Km(k)  =  tr{[P1+A‘L1B(k+l)A-Il4^k)]E3(k)-L1^k+l)Z3(k+l)  + 

Ll4B(k)Il(k)+Li^k+DR)+Ki^k+l),  KjgfNJxtrfP^fN)}.  (C25) 

K2^fk)  =  tr  {[P2+A«L2B(k+l)A-L24B(k)]E3(k)-L2B(k+l)E3(k+l)  + 

L24^k)ri(k)+L2B(k+l)R}+K2B(k+1),  K2fi(N)  =  tr{P2I3(N)).  (C  26) 

^“*1413^)  =  Pi+F24B^C^lF24^+^A+B2F24BPC^'LlEfk+1^A+B2F24BPC^"LlB^ 

(C  27) 

L24B(k)=P2+F^4B(k)F24B(k)+(A+B2F24B)‘L2B^+1)(A+B2F24B(k)"L2B^- 

(C28) 

Remark:  It  is  easy  to  see  that  in  the  above  theorem 


fhb^  =  fiia^  ' 

ZB00  =  zA 

F2B^  *  F2A^  * 

LjgOt)  =  Lj 

F21BW  =  F21A*k*  * 

L2bW  =  L; 

Proof  of  Theorem  tS:  We  will  prove  part  (i)  only,  the  proof  for  part 
(ii)  is  similar. 

Feedback  Stackelberg  strategies  have  the  property  that  they 
are  in  static  Stackelberg  equilibrium  at  every  stage  of  the  problem. 
This  property  can  be  observed  from  its  definition  and  hence  we  can 
solve  the  problem  by  going  backwards  (a  dynamic  programming  type 
of  approach). 

At  stage  N  (no  more  decisions  to  be  made),  the  cost  to  go  of 
decision  maker  i  is 


J.(N)  =  E[x'(N)P.x(N)] 

=  E[^(N)P.x3(N)]  +  tr{P.S:3(N)} 

=  Ej^tNJI^jNJXjIN)]  +  K^tN) 

L^fN)  =  P.  ,  K^IN)  =  tr{P.I3(N)} 


(C29) 


where 


At  stage  N-l  (I^(N-l)  is  available),  decision  maker  i's  objective  is  to 
minimize  J\(N-1)  given  by 

J.  (N  - 1)  =  E[  x'  (N  -  1)P.  x(M  -  l)+u!  (N  -  l)u.  (N  -  l)+u (N  - 1)  Q.  u .  (N  - 1)  +x'  (N)P.x(N) 


I 


By  applying  the  Kalman  filter  equations  (19)  and  Lemma  .  3.  1  we 
obtain 


JjfN-l)  *  uJtN-DujfN-D+u^fN-DC^tN-D-KA^tN-iJ+B^fN-l) 
4B2u2(N-l))’L1A(N)(Ai3(N-l)4Blul(N-l)+B2u2(N-l)) 
+x«3(K-l)P1x3(N-l)+trtP1I3(N-l)+L1A(N)(X3(N/N-l)-I3(N)]} 
+1^(1*)  (*31 

and 

tyN-1)  =  E[  (  Ax3(M  - l) +B  ^(N  - 1) + B  2u2  (N  - 1)) '  A(N  )  (Ax3(N  - 1) +B  ^(N  - 1) 
+  B2u2(N-l))  |I2(N-l)]  +  Etu^N-DQ^^-l)  |l2(N-l)] 

+  xl(N-l)P2x1(N-l)  +  tr£P1Il{N-l)+L2A(N-l)[l3(N/N-l)-I3(N)]} 
+  K2A(N)  **  ;  -  (C  32 

To  any  strategy  Yj^(r^(N-l))  enounced  by  the  leader,  the  follower' 
rational  reaction  set  is  a  singleton,  i.  e. , 

Y2A1(I2<N-l»  b  -B^L2A[I+B2B^L2a(N)]’1[A5c1(N-1) 

+  BtE[  (N-i)  |IZ(N-D]  (C33 

Substituting  u2(N-l)  given  by  (C  33)  into  (C  31)  and  optimizing  J^(N-1) 
with  respect  to  Uj(N-l)  we  obtain 

u*(N-i)  =  Fj  jA(N-1)Xj(N-  1)  +Fj^a(N-  l)x^(N-l)  (C34i 


vwci  =»ryv 


where  FjjA(N-1)  and  arc  giv«n  respectively  by  (C2)  and 

(C3).  Substituting  (C34)  into  (C33)  we  obtain  u^fN-l)  given  by  (Cl6). 

Substituting  u*  (N- 1)  and  (N - 1)  into  J^(N- 1)  we  obtain  (H9)  and 

(€10)  for  k  *  N-l.  The  proof  of  this  feedback  Stackelberg  solution 
can  then  be  concluded  by  an  Inductive  argument. 


APPENDIX  P 


In  this  appendix  we  prove  Theorem  5.4-.  We  will  prove 
part  (i)  only,  the  proof  for  part  (ii)  ia  similar. 

From  equation  (C.9)  and  (£  1 1)  of  Appendix  C  ve  obtain  that  the 
coat  for  the  leader  in  Caae  A  ia 

J^O)  =  trCL^OJfi^OJ  +  lP.+A'L^UA-L^OJl^W+L^lJR 

N-l 

+  L14A(0)I1(0)+  1  l[P1+A,^(k+l)A-L1A(k)-L.4A(k)]l3(k) 
k=l 

+  L1A(k+l)R  ♦  LUAtk)Il(k)]}  —  -  lPi) 

Let  J^O)  and  jj^(0)  correspond  to  Information  I  and  II  respectively, 
then 

If  Information  I  provides  better  information  for  the  leader  alone 
than  Information  II,  then  Lemma  4-  •  ^  implies  J^(0)  >  ht 

Pl+A'  L^k+IJA-  L^Jk)  -  ^l4A(k)  >  0  for  every  k€0  . 


N-l  „  . 

-  Z  trftP^A'L^k+DA-L^Ck)- Ll4A(k)][l7<k)-l£(k)] 

+  L14A(k)[t“(k)-l|(k)]} 


Hi 


Substituting  equation  (c7)  and  (Cl 3)  of  Appendix Q  into  the  left  hand 
side  of  the  above  equation  we  obtain 


Pt+A'  L^fk+DA-  L^fk)- Lw(k)  =  [  B-L^fk+Dll+B^'L^fk+l)]-  lA} ' 

[I+BiL^(k«,B1]B.Lu(k+l)[I+BiBiLlA(k+l)rlA 
>  0  . 


t  D3J 
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A  two-person  one-act  LOG  Nash  game  is  considered  under  three  different  information  structures:  explicit  control  sharing,  implicit 
control  sharing  and  static  information.  The  relations  among  the  corresponding  solutions  and  their  impacts  on  the  resulting  costs  are 
studies. 


1.  Introduction 

In  game  problems,  the  players  have  certain  kinds  of  information;  they  make  decisions  based  on  this 
information.  We  say  that  there  is  explicit  control  sharing  (ECS)  in  a  game  if  a  player's  information 
includes  the  previous  control  values  of  other  players.  Two  previous  works  concerning  the  impact  of  ECS 
on  the  optimal  costs  in  Nash  games  were  reported  in  [1]  and  [2].  In  [1]  a  two-person  LOG  Nash  game 
was  considered  where  the  information  structure  is  partially  nested  and  each  player  acts  once  and  it  was 
shown  (theorem  2  of  [1])  that  the  first  player  might  do  better  if  he  reveals  his  control  value  to  the 
second  player  than  he  could  do  in  a  static  information  structure  (SIS).  It  is  known  that  in  Nash  games,  if 
there  is  ECS  then  in  general  there  exist  many  solutions  [8].  Uchida  considered  an  example  of  a 
two-person  LQG  Nash  game  [2]  where  the  information  is  partially  nested  and  each  player  acts  once, 
and  showed  that  among  the  nonunique  solutions  under  ECS,  one  of  them  is  equivalent  to  the  SIS 
solution.  Furthermore,  it  is  claimed  in  [2}  that  this  SIS  solution  gives  a  local  minimum  of  the  first 
player’s  cost  among  the  linear  class  of  the  nonunique  solutions.  In  other  words,  the  first  player  might  do 
better  at  least  locally  in  a  SIS  than  if  he  reveals  his  control  value  to  the  second  player.  The  claim  which 
Uchida  did  not  prove  and  the  result  of  Ho,  Blau  and  Basar  in  [1]  seem  to  contradict  each  other. 

In  this  paper  we  consider  a  two-person  LQG  Nash  game  where  the  information  is  partially  nested  and 
each  player  acts  once.  We  study  the  impact  that  the  first  player,  who  reveals  his  control  value  explicitly 
and  implicitly  to  the  second  player,  has  on  the  first  player's  Nash  cost.  By  implicit  control  sharing  (ICS) 
we  mean  that  player  2  has  a  noise-corrupted  measurement  which  is  affine  in  the  system  state  and  player 
l’s  control.  Our  aim  is  to  relate  the  Nash  solutions  under  ECS  to  those  under  ICS  and  give  a  full  view  of 

*  This  research  was  supported  by  AFOSR  grant  82-0174  and  by  the  USC  Faculty  Research  and  Innovation  Fund. 
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solutions  exist  under  certain  nonsingularity  conditions.  Here  we  state  the  solutions  and  for  proofs  we 
refer  to  (6]-[8).  The  impact  of  ICS  and  ECS  is  then  considered  by  comparing  the  Nash  costs  J ,(y,,  y2)  of 


cases  A  and  B  (including  the  Stackelberg  cost)  with  that  of  Case  C. 

2.1.  Nash  solution  of  Case  A 
Under  the  condition 

1  +  qtf  +  q2b\  +  fA(q,  -  px2q2)bxb2  *  0  (7) 

the  unique  Nash  solution  of  Case  A  is  given  by 

?iA(yi)=  “(I +  <ltf  +  q2b]  +  £A(q,  -  +  fA(<?i  ~  PtrtzWi)  •  h, (1  +  h’f'y, .  (8a) 

yJ  =  -(i  +  q2b^)‘1<?A{b,y1A(yi)+ (*iyi  +  *2(y2 -  ^tiaCv.))]  •  0  +  h\  +  *|) ') .  (8b) 

where 

(a  =  -0  +  q2b\y'q2b2h2d(\  +  h]+  h\y' .  (9) 


Notice  that  (y1A,  y^)  depends  on  £A,  which  in  turn  depends  on  d.  To  different  d  s,  corresponds  different 
pairs  (y1A,  y^  provided  that  (7)  holds.  Let  us  call  M  the  class  of  all  these  solutions  (y,A.  y;A),,  for 
varying  values  of  d. 

2.2.  Nash  solution  of  Case  B  ( linear  class) 

There  exist  uncountably  many  Nash  solutions  for  Case  B,  with  the  linear  ones  given  by: 
ym(y.)  =  -{1  +  qtf  +  q2b\  +  ((qx  -  pl2q2)bxb2Y'  •  (Ml  +  £(<?>  -  Pt3h)H  ■  Ml  +  *i)_,y, .  (10a) 

yaCyi,  y“-  «i)  =  “0  +  <ht>\Y'q2b2{blyiB(yl)  +  (/i,y,  +  h2y2](l  +  h\  +  /i2)''}+  £(«,  -  y1B(>’i))  •  (10b) 

where  £  is  any  real  number  such  that 

1  +  qtb]  +  q2b\+  ((q,~  Pi2q^btb2  *  0.  (11) 

Let  us  denote  by  L  the  class  of  all  these  linear  solutions  (y)B.  y2B)t-. 

2.3.  Stackelberg  solution  of  Case  B 

The  Stackelberg  solution  with  player  1  as  the  leader  is  denoted  by  (y,s.  y^)  and  is  the  following. 

(yis-  yss) =  (yiB-  y2B)r-rs  *  (**■) 

where 

^s=  -d  +  ^2*D‘^.*2  (13) 

2.4.  Nash  solution  of  Case  C  •  . 

The  Nash  solution  of  Case  C  is  a  special  one  of  Case  A  with  f  A  =  0  in  (8).  It  is  also  a  special  one  of 
Case  B  with  £  =  0  in  (10).  Notice  that  (7)  and  (11)  are  satisfied  when  (A  =  (  =  0. 
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Fig.  1.  Impact  of  ECS:  Jt(y\s,  yib)  a s  a  function  of  (  where  {% 
denotes  the  Stackelberg  solution  and  0  denotes  the  SIS  Nash 
solution. 


Fig.  2.  Impact  of  ICS:  Yia)  as  a  function  of  d  where 

<fs  =  (1  +  h}+  h\)b\lhi  and  0  denotes  the  SIS  Nash  solution. 


Theorem  1. 

(i)  In  L ,  the  set  of  uncountably  many  linear  Nash  solutions  under  ECS ,  the  unique  local  and  global 

minimum  of  7,  is  given  by  (y1B,  which  is  the  Stackelberg  solution. 

(ii)  Under  ECS,  player  1  can  do  better  than  under  SIS  if 


(E[(s,0). 

(iii)  Under  ICS,  player  1  can  do  better  than  under  SIS  if 
d  £  (0,  (1  +  h\  +  h^)6,/hj  . 

Remark  1.  This  theorem  shows  that  Ucnida's  claim,  namely  that  the  SIS  solution  is  a  local  minimum  of 
7,  in  L,  remark  3.3(i)  of  [2],  is  false. 

Remark  2.  This  theorem  indicates  that  the  Stackelberg  solution  is  more  beneficial  to  player  1  as  should 
be  expected  in  general  than  all  the  other  Nash  solutions  under  ECS  and  SIS.  It  is  not  difficult  to  see  that 
the  Nash  solution  under  ECS  considered  in  theorem  2  of  [1]  is  actually  a  Stackelberg  solution. 

Remark  3.  This  theorem  and  Fig.  1  give  a  general  description  of  the  impact  of  ECS  on  7,  which  includes 
the  result  of  theorem  2  of  [1]  as  one  particular  impact  out  of  uncountable  ones. 

Remark  4.  The  parameter  d  in  (3)  can  be  regarded  as  a  measure  of  the  strength  with  which  player  1 
communicates  his  control  implicitly  to  player  2.  It  can  be  regarded  also  as  an  incentive  mechanism  in  a 
leader-follower  situation,  e.g.  if  the  leader  cannot  communicate  his  control  value  to  the  follower  free 
from  noise,  then  by  designing  d  =  (1  +  h]+  h\)bfhi  in  (3)  and  playing  Nash  (ICS),  the  leader  can  expect 
the  same  performance  as  in  a  Stackelberg  game  where  the  follower  has  perfect  knowledge  of  the 
leader’s  control  value. 


4.  Comments 

In  this  section  we  give  comments  concerning  the  impact  of  ECS  on  7,.  In  the  first  part  we  explain  part 
(i)  of  theorem  1,  i.e.  why  a  local  minimum  of  7,  among  L  is  given  by  the  Stackelberg  solution  instead  of 
the  SIS  solution  as  claimed  by  Uchida.  In  the  second  part  we  explain  part  (ii)  of  theorem  1,  i.e.  why 
player  1  can  do  better  in  a  continuous  range  of  £  under  ECS  than  under  SIS. 

Since  7,(y,,  y2)  is  quadratic  in  yt,  i,j=  1,2,  7,(y,,y2)  is  differentiable  w.r.t.  yr  Furthermore.  y2  is 
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Stackelberg  solution  (6)  to  hold  [4].  Since  i,  and  J2  are  convex  in  y,  and  y2,  the  first-order  necessary 
condition  is  also  a  sufficient  condition  for  (*  to  be  a  Stackelberg  solution. 

It  is  remarkable  that  under  ECS,  although  J,( y1B,  y2B)  depends  on  the  statistics  of  the  observation 
noise,  its  ordering  for  different  £  is  independent  of  the  noise,  as  we  can  see  from  (20).  Thus,  in  order  to 
explain  the  ordering  of  J,(y1B,  y2B)  f°r  different  £,  one  need  consider  only  the  deterministic  game.  In 
Section  2,  if  there  is  no  observation  noise  in  (2)-{4),  then  for  any  given  u,  the  optimal  value  of  u2 
minimizing  J2  is  determined  uniquely  by 

y2(x,  M,)  =  -(1  +  qMY'qMb.u,  +  x) .  (29) 

The  locus  of  such  points  (u„  u2)  given  by  (29)  for  all  u,  E  R  is  called  the  reaction  curve  of  player  2.  The 
reaction  curve  of  player  1  is  similarly  determined.  Equicost  contours  of  J,  and  J2  and  the  reaction  curves 
of  both  players  are  plotted  in  Fig.  3  for  some  particular  values  for  the  parameters  of  the  game.  The 
Nash  solutions  of  Case  B  given  by  (10)  now  reduces  to: 

yiB(*)  =  ~{1  +  q,b]  +  q2b\  +  £(qx  -  p12q2)bib^'{qlbl  +  £( q ,  -  pt2q2)b2}x ,  (30a) 

y2B(*.  ui)  =  -(1  +  qi^qib^byy^ix)  +  x}  +  £(u,  -  y,B(x)) ,  (30b) 

for  all  (E  R  such  that  (11)  holds.  Notice  that  at  each  solution  point  of  (30),  the  value  of  u2  given  by 
(30b)  is  equal  to  that  determined  by  the  strategy 

y2(x,  «,)  =  -(1  +  q2b\)~'q2b2(bxu,  +  x) .  (31) 

Equation  (31)  is  the  same  as  (29),  which  means  that  all  the  Nash  solution  pairs  (y1B,  y2B)(  are  on  Rv  the 
reaction  curve  of  player  2.  Furthermore,  since  {u,}  given  by  (30a)  for  all  (E  R  such  that  (11)  holds,  is 
the  real  line,  we  conclude  that  R2  comprises  all  the  linear  Nash  solutions  of  Case  B.  Point  C  in  Fig.  3 
represents  (yIC,  y^)  =  (y1B,  y^r-O'  the  SIS  solution  where  /?,  and  R2  intersect.  Point  S  represents 
(yls,  y2s)=  (y1B,  y2B){-rs-  the  Stackelberg  solution,  where  R2  is  tangent  to  the  contour  of  /,  [1].  Fig.  3 
shows  clearly  that  point  S  gives  a  global  minimum  of  /,  on  R2  and  point  C  is  by  no  means  a  local 
minimum  of  J,  on  R2.  All  the  points  between  C  and  S  on  R2  yield  lower  cost  of  /,  than  point  C.  Finally, 
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Fig.  3.  Illustration  of  the  impact  of  ECS  on  J\.  Ri-  reaction  curve  of  player  2;  Rt:  reaction  curve  of  player  1 
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We  study  adaptive  schemes  for  repeated  quadratic  Nash  games  in 
a  deterministic  and  a  stochastic  framework.  The  convergence  of  the 
schemes  is  demonstrated  under  certain  conditions. 
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1.  INTRODUCTION 


The  object  of  this  paper  is  the  study  of  a  static  quadratic  Nash  game 
where  the  players  do  not  have  knowledge  of  the  parameters  involved  in  the 
description  of  the  cost  of  their  opponents  and  of  their  opponent's  informa¬ 
tion.  The  game  is  played  repeatedly  and  at  each  stage  the  players  know 
the  past  actions  of  their  opponents.  The  only  dynamics  involved  are  in  the 
accumulation  of  the  information  on  their  opponent's  previous  actions;  apart 
from  this  dynamic  aspect,  the  problem  considered  is  a  repeated  statis  game. 

We  examine  both  the  deterministic  and  stochastic  case,  consider  some  adaptive 
schemes  for  updating  the  players  decisions,  and  we  show  convergence  to  the 
optimal  decisions  (in  the  mean  square  sense  and  with  probability  one  for  the 
stochastic  case),  under  some  conditions.  The  scheme  for  the  stochastic  case 
is  actually  a  stochastic  approximation  algorithm  of  the  Robbins-Monro  type. 

The  underlying  motivation  for  the  present  paper  is  to  study  situations 
of  conflict  where  the  players  do  not  know  some  of  the  parameters  involved 
in  the  description  of  the  others'  cost  functionals,  or  in  the  state  equation. 
Such  situations  have  been  and  are  being  studied  for  the  single  player  -  i.e., 
control  problem  -  case  and  come  under  the  name  of  Adaptive  Control;  the  cor¬ 
responding  problems  for  situations  of  conflict,  i.e..  Adaptive  Games,  has 
received  very  little  attention  up  to  now.  The  problem  studied  here  can  be 
considered  as  a  very  simple  type  of  adaptive  game  where  the  players  adapt 
their  decisions  so  as  to  converge  in  the  limit  to  the  solution  of  a  static 
Nash  game.  It  should  be  noted  that  the  strategies  exhibited  in  this  paper 
do  not  constitute  a  Nash  equilibrium  pair  for  the  construed  dynamic  -  dynamic 
due  to  the  dynamic  information  -  game;  but  similarly,  the  adaptive  control 
strategy  in  the  self-tuning  regulator  problem  [5],  converges  in  the  limit 


to  the  optimal  solution  without  being  necessarily  optimal  at  each  stage. 

Adaptive  games  are  important  for  several  reasons.  For  example,  when  two 
players  are  involved  in  a  situation  of  conflict,  it  is  reasonable  to  assume 
that  each  player  knows  his  own  objective,  but  not  that  of  his  opponent;  in 
addition,  he  might  not  know  several  of  the  parameters  of  the  dynamic  system 
which  couples  him  with  the  other.  In  decentralized  control,  we  think  of 
decentral ization  as  a  scheme  according  to  which  each  controller  knows  his  own 
objective  and  information  but  not  those  of  the  others.  If  each  controller  knew 
the  objectives  of  the  others  -  as  is  implicitly  assumed  in  many  existing  decen¬ 
tralized  schemes  -  then  the  notion  of  decentralization  is  weakened.  Although 
considerable  progress  has  been  achieved  for  the  centralized  controller,  single 
objective  adaptive  control  [4-6],  the  area  of  adaptive  games  is  in  its  infancy. 
The  only  work  that  the  author  is  familiar  with  in  this  area  is  [7]  and  [8]. 

In  [7],  adaptive  schemes  based  on  self-tuning  for  stochastic  Nash  and  Stackelberg 
games  are  considered,  where  the  players  have  the  same  information.  (In  the 
present  paper  the  information  of  the  players  is  different.)  In  [8]  two 
adaptive  schemes  are  studied  for  repeated  Stackelberg  games  in  a  deterministic 
framework. 

The  structure  of  the  paper  is  as  follows.  In  Section  2  we  consider  the 
deterministic  case  and  study  three  simple  adaptive  schemes.  In  Section  3  we 
consider  an  adaptive  scheme  for  the  stochastic  case.  The  stochastic  scheme  is 
a  Robbins-Monro  type  of  stochastic  approximation  algorithm.  Although  several 
results  exist  for  such  algorithms,  many  of  which  can  be  used  to  provide  conver¬ 
gence  for  the  scheme  considered  here,  the  conditions  of  convergence  that  they 
would  obtain  for  our  scheme  are  more  stringent  than  those  that  we  prove  here. 

In  each  section  we  provide  several  comments  relating  the  results  with  previous 
work,  expand  on  their  meaning  and  provide  appropriate  motivation.  Finally,  we 
have  a  conclusions  section. 


2 


2.  DETERMINISTIC  CASE 


Let  9  • 


m,  rrio 
R  1  x  R  c 


R  be  two  functions  defined  by: 


01(u1,u2)  =  5s  u!u. +  u!R.Uj +  u!c.,  i^j,  i , j  =  1 ,2  (1) 

m. 

where  c  R  ,  R^ ,  R2  are  real  constant  matrices  and  c^ ,  c2  are  real  constant 
vectors  of  appropriate  dimensions.  A  pair  (u^,u|)  is  a  Nash  equilibrium  if 
it  satisfies  ([1].[2]): 


J-|(U^,U|)  £  J^UpU^), 


m 

V  u,  €R 


1 


(2) 


m„ 


*^2 ,u2^  ^ 


J2(u*,u2 


), 


u2€R 


(3) 


or  equivalently  if 


R 

V 

+  c  =  0,  R  = 

'l  R," 

,  C  = 

1 - 

l£_ 

u2 
—  — - 

r2  1 

(4) 


0..  and  ui  are  the  cost  and  the  decision  of  player  i. 

Let  us  assume  that  player  i  knows  R^  and  ,  but  not  Rj  and  Cj  ( j  /  i ) ;  then 
he  cannot  solve  (4)  for  u*.  Consider  also  that  this  game  is  played  repeatedly 
at  times  t=  1,2,3,...  ,  that  at  time  t,  player  i  knows  l|. =  *U1 ,1  ’* "  ,U1  ,t-l  ’ 
u2  p..., u2  t_j}  and  plays  uit  which  is  chosen  as  a  function  of  ij.,  i.e., 

uit  =  Fi(It,t)*  1  =  1’2’  t-  2,3,... 


3 


A. 


(5) 


The  question  is:  For  what  F-j,  F2  the  recursion  (5)  will  converge  to  a  solution 
of  (4).  Let  us  now  examine  three  possible  choices  of  F^ ,  Fg. 

Case  1 

Fi(lJ,t)  =  -Riuj>t_1  -c.,  i-1,2,  i  t  j  (6) 

The  meaning  of  (6)  is  that  player  1  minimizes  J^(u^,u2  i«e.,  he  reacts 

only  to  the  last  announced  decision  of  player  2.  Recursion  (5)  assumes  the  form 


Recursion  (7)  will  converge  to  a  solution  of  (4)  for  any  initial  condition 
(Ui  -j  ,u2  -j )  if  and  only  if  all  the  eigenvalues  of  the  matrix  R  lie  within  the 
open  disc  of  radius  1  centered  at  the  point  1  in  the  complex  plane,  i.e., 

Jx (R) —  1 [  <  1  (8) 

((8)  is  equivalent  to:  J X ( R2 ) |  <  1.)  Condition  (8)  also  guarantees  that  (4) 
has  a  unique  solution. 


1  >  6  >  0,  i  =  1 ,2, 


The  meaning  of  (9)  is  that  player  1  minimizes  with  respect  of  u1 ,  with  u2 
fixed  to  a  value  that  is  a  weighted  average  of  u2  t_-j,...,u2  -j  where  more 
weight  is  put  on  the  recent  values  of  u2.  We  assume  that  both  players  use  the 
same  6.  Recursion  (9)  can  be  written  equivalently: 


T  ,t-l 


2,t-l 


2,t-l 


T  /  Ul,t-1  v 

- TT  (R  +  c  J  ,  t*2  (10) 

l-et_1  V  u„  *  ,  / 


Recursion  (10)  will  converge  to  a  solution  of  (4)  for  any  initial  condition 
(u1  -j  ,u2  -j)  if  and  only  if  all  the  eigenvalues  of  the  matrix  R  lie  within  the 
open  disc  of  radius  (1  -8)“"*  centered  at  the  point  (1-e)-"'  in  the  complex  plane 


i.e. , 


I^R)  -  TTe  I  < -Pe 


Condition  (11)  also  guarantees  that  (4)  has  a  unique  solution.  (Notice  that 
as  t-*  +  ®,  et_1  -  0  and  thus  (1  -6 )R  in  (10)  assumes  the  role  of  R  in  (7).) 
Obviously,  for  6=0,  (11)  reduces  to  (8)  and  (10)  to  (7). 


Case  3 


Fl(It*t)  *  'Ri^uj,t-1  +uj,t-2+  +  uj ,1  ^  FT  "  ci 


1-1,2,  i  t  J. 


The  meaning  of  (12)  is  that  player  1  minimizes  J-j  with  respect  to  u^,  with  u2 
fixed  to  the  arithmetic  mean  of  u«  *  »u0  , .  Recursion  (12)  can  be 


written  equivalently: 


Ul,t-1 

U2,t-1 


u 


l,t-l 


t*2 


03) 


Recursion  (13)  will  converge  to  a  solution  of  (4),  for  any  initial  condition 
(ul  -j^  i )  if  and  only  if  all  the  eigenvalues  of  R  has  positive  real  parts, 
i  .e. , 


Re  X (R)  >  0  (14) 

(For  proof  see  Appendix  A,  Lemma  A3.)  Condition  (14)  also  guarantees  that 
(4)  has  a  unique  solution.  Notice  that  as  0-*l,  (11)  reduces  to  (14). 

Remark  1  Obviously  (8) (11 )  =>  (14) .  If  (8)  holds,  (7)  converges  faster 
than  (10)  and  if  (11)  holds,  (10)  converges  faster  than  (13). 

Remark  2  In  all  three  cases  we  assumed  that  both  players  use  the  same 
scheme.  Nonetheless,  it  might  happen  that  they  use  different  ones.  It  is  easy 
to  verify  that  if  player  1  uses  scheme  1  and  player  2  uses  scheme  2,  the  region 
of  convergence  is  larger  than  if  both  were  using  scheme  1  and  worse  than  if 
both  were  using  scheme  2.  Similar  results  hold  for  the  other  combinations. 

Remark  3  If  we  consider  (10)  with  6  >  1 ,  i.e.,  more  weight  is  assigned  to 
the  old  measurements,  the  scheme  will  not  converge.  This  can  be  easily  verified 
by  considering  the  scalar  version  of  (10)  with  c=0: 


ut  *  Ut-1^  "r  V  \~t-l  y  =  e 


which  for  t-*  +  ®  behaves  like 


=  »w) 


(since  0<y<  1)  and  is  easily  seen  to  fail  to  converge. 


Remark  4  (8),  (11)  and  (14)  can  be  expressed  equivalently  in  terms  of 
the  eigenvalues  of  R^R^. 


i.e.,  inside  the  curve  C2  of  Fig.  1.  (14)  corresponds  to  eigenvalues  of  R^R2 

being  inside  the  parabola  defined  by 

ReX  +  %(ImX)2<  1,  X  =  X(R1R2)  . 

Remark  5  If  (8)  (or  equivalently  Jx(R^R2)| <  1)  holds,  the  solution  of  (4) 
is  called  in  game  theory  a  stable  equilibrium,  and  the  game  is  called  stable [1] 
The  reason  is  that  if  player  i  deviates  from  u*,  then  player  j(j  f  i )  responds 
according  to  scheme  (6)  and  to  that  player  i  responds  according  to  scheme  (6) 
and  so  on  and  eventually  they  both  converge  back  to  (uf,u£).  Obviously  the 
notion  of  stable  equilibrium  depends  on  the  reaction  scheme  that  the  players 
employ.  If  schemes  (9)  or  (12)  are  used  as  reaction  schemes,  we  have  an 
enlarged  class  of  stable  games. 

Remark  6  Since  the  scheme  of  case  3  (12)  has  the  best  convergence  region 
out  of  the  three  schemes,  in  the  next  section  we  will  deal  with  the  stochastic 
analogue  of  (12). 

Remark  7  All  three  schemes  considered,  can  actually  be  viewed  as  schemes 
for  solving  Ru  +  c  =  0  (see  (4)),  by  using  an  iteration  of  the  form: 

Vl  ’  “n'W'l  (15) 

where  has  to  have  the  structure 


(Iterative  solutions  of  linear  equations  is  a  vast  subject,  see  for  e.g. 

[16].)  Scheme  (13)  employed:  ^  I.  We  can  create  new  schemes 

which  converge  under  weaker  conditions  than  (14)  by  allowing  D*  "  where 
12 

D  ,  D  are  properly  chosen  constant  matrices.  For  example,  if  Rj,  R2  are 

•  « 

scalars,  (14)  is  equivalent  to  1  >  r^;  but  if  we  use  ^  d..  in  (15),  the 
convergence  condition  becomes 


which  is  equivalent  to: 

+  d2>  0 

d,d2(1  -  r,r2)  >  0 

and  can  always  be  satisfied  for  some  d^ ,  d2  as  long  as  1  /  r^.  Notice,  that 
1  /  rir2  is  the  necessary  and  sufficient  condition  for  solvability  of  (4)  for 
any  c. 

Remark  8  Another  way  of  going  about  the  problem  of  this  section  is  to 
consider  that  at  each  stage,  each  player  uses  a  certain  scheme  to  estimate  the 
R  and  C  of  his  opponent  and  then  calculates  his  action  by  solving  (4^)  wherein 
he  employs  the  estimates  of  the  R  and  c  of  his  opponent.  In  such  a  scheme, 
each  player  should  know  at  each  stage  not  only  the  previous  actions  of  his 
opponent  -  as  In  our  scheme  -  but  also  the  rationale  according  to  which  his 
opponent  calculates  his  actions.  This  is  necessary  in  order  just  to  estimate 
his  opponent's  parameters  at  each  stage.  Nonetheless,  such  an  additional 
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knowledge  can  be  permitted  and  the  convergence  of  the  resulting  scheme 
studied.  Finally,  it  should  be  noted  that  the  problem  considered  here 
and  the  schemes  proposed,  besides  having  their  own  merit,  provide  a 
certain  motivation  for  the  schemes  considered  for  the  stochastic  case 
of  the  next  section. 


3.  THE  STOCHASTIC  CASE 


Let  x  be  a  Gaussian  random  vector  in  Rn  with  zero  mean  and  unit  covariance 
matrix.  Let 

=  C^x,  i  =  1 ,2  (16) 

represent  the  measurements  of  the  two  players,  where  are  fixed  real 

matrices  of  dimensions  n,  xn,  n-xn  respectively.  Let  be  the  set  of  all 
n.  m. 

measurable  *R  -*R  functions  with  E[y^ (y^ ) (yi )]<  +  “.  Set  u^y^y..) 
and  let 

Ji(Yl.Y2)  =  ECH  u;ui  +  uiRiUj  +  u;siX],  i , j  =  1 .2  (17) 

represent  the  costs  of  the  two  players.  R-j ,  R^,  Sjt  Sg  are  fixed  real  matrices 
of  appropriate  dimensions.  A  pair  (y| >y2)  €T-|  x  r2  is  called  a  Nash  equilibrium 
if  it  satisfies 


ji(yT»y|)  *  ^i  ^Yi  »y|) 

VY,€r, 

J2^Yt#YS^  22  J2^y1  *y2^ 

v  y2er2 

For  background  concerning  the  formulation  of  the  stochastic  Nash  game  see  [18]. 
(18)  Is  equivalent  to  (see  [2,3]): 

YfCy-,)  +  R,  ECY|(y2)|y13  +  s,e  [x|yi]  »  o  (19a) 


Y|(yz)  +  r2  eCyTCyt ) Iy23  +  $2E  Cx|y23  =  o 


09b) 


It  is  known  (see  [3])  that  if  no  eigenvalue  of  RjR2  equals  the  inverse  of 
any  arbitrary  but  finite  product  of  powers  of  the  squares  of  the  canonical 
correlation  coefficients  of  y-j,  y2  (i.e.,  of  a2,...),  then  (19)  has  a 
unique  solution  which  has  to  be  linear  in  the  information.  The  set  of  values 
where  the  eigenvalues  of  R-|R2  should  not  lie  is  a  countable  isolated  set  of 
points  in  [l,+  *)and  thus  it  is  generically  true  that  (19)  admits  a  unique 
solution  which  has  to  be  linear  in  the  information.  We  can  assume  without 
loss  of  generality  (see  Lemma  1  [  3  ])  that 


n,  £  n. 


ClCi  =  «■ 


l*nl 


C2C2'V 


x  n. 


r  c  - 
°1L2 


(20) 
n-j  x  n 


lio^o^o  *0 


and  then  yf (y.| )  *  where  L1 ,  L2  are  the  solutions  to  the  system: 


L1  +  RlL2C2Ci  +  Sl^l  0 


(21) 


^2  +  R2^1^1^2  +  ^2^2  =  ® 


Let  us  assume  that  player  i  knows  R^,  S^,  but  not  Rj,  S^,  C^,  i  j*  j; 
then  he  cannot  solve  (21)  for  L^ .  Consider  also  that  this  game  is  played 
repeatedly  at  times  t *  1 ,2,3,. .. ,  that  at  time  t  player  i  knows 


lt  s  {ul,V,*ulft-Vu2fl . u2,t-l  ,yi  ,1 . yi,t-l*  ^ 


where  y^t  is  the  measurement  of  player  i  at  time  t.  We  assume  that 


T7~S 


y« *  cixt 


(23) 


where  the  x^'s  are  independent  Gaussian  vectors  with  zero  mean  and  unit 
covariance.  At  time  t,  player  1  employs  the  following  scheme  for  finding 


1 

ult+  Vt^T  u2,kyl,k)ylt  +  SlCiylt=  0 


A  justification  of  this  scheme  is  the  following:  at  time  t  player  1  has  to 
solve  (19a)  for  u1t  and  thus  he  has  to  calculate  E[u^  t|ylt].  E[x|ylt]. 

If  u2t  is  linear  in  y2t»  then  u2t »yi t  are  gaussian  and  thus 

E[u2,tiyn]  1  ^a2ty\0  (Etyityit])"1yi,t  <25> 

1  T 

Player  1  approximates  E[u2tyjt]  by  y^y  (u2  kylk);  a  motivation  for  this 
approximation  is  the  following:  If  player  1  knew  all  the  parameters  of 

(16),  (17),  he  would  then  solve  equation  (19)  at  stage  t,  employing  (23); 

1  t_1  t 

due  to  the  independence  of  the  x^'s,  I  ( u2kyl k ^  wou^  provide  a 
reasonable  approximation  of  E[u2t|y^t],  since  u2k  would  be  independent  of 
u2X*ylX*  ^  ^  By  overlooking  the  lack  of  independence  of  u2k  on  u2JL*yM* 
if  k,  he  still  employs  the  above  approximation,  hoping  that  things  will 
work  out.  The  convergence  results  of  Theorem  V  and  2'  provide  a  posterior 
justification  for  the  reasonableness  of  this  approximation. 

By  our  assumption  (20)  E[y1tyjt]=I  and  Etxtlylt-^ =  SlCiylt*  yields 

that  ult  is  linear  in  ylt,  i.e.,  =  L<|tylt  where  ht  satis^i®s 
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(26) 


LU  +  R1  u2kylk]  +  S1C1  =  0 

A  similar  equation  is  satisfied  by  l^.  if  we  consider  that  Ug^  is  calculated  by 

an  equation  corresponding  to  (24)  and  u2tsL2ty2t*  T^e  e9uat10ns  for  Lit’  L2t 
can  be  written  recursively  as: 

Llt  =  Ll,t-1  ‘  FTtLl,t-l  +  RlL2,t-ly2,t-lyl,t-l  +  SlCi:i  (27a) 


L2t  =  L2,t-1  '  t-1  tL2,t-l  +  R2L1 ,t-lyl ,t-ly2,t-l  +  S2C2] 


(27b) 


(27)  is  the  recursion  that  we  intend  to  study  and  show  that  under  some  con¬ 
ditions  converges  to  the  solution  of  (21)  in  the  q.m.  sense  and  w.p.l.  The 
initial  condition  of  (27)  is  taken  to  be  an  arbitrary  pair  of  real 

constant  matrices  and  we  are  interested  in  convergence  for  any  initial 
condition.  (27)  defines  a  Markovian  stochastic  process  (Lit’^2t^  and  1S 
obviously  a  stochastic  approximation  algorithm  of  the  Robbins-Monro  type  [9] 
for  solving  (21).  (27)  is  the  stochastic  analogue  of  the  scheme  of  case  3  of 

the  deterministic  case. 

Let  us  now  study  the  convergence  of  (27).  Let  us  call  m..^,  c^ ,  d.. 
the  i-th  columns  of  Llt,  Lgt»  SjCj,  respectively,  i.e.. 


Llt  =  t*lt . ^t^’  L2t  "  ^mlt’*--»mn2t^ 

S!Ci  *  Ccl,-"cn13,  S2C2  *  Cd2 . dn2] 


(28) 
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Using  (20)  and  the  fact  that  Llt  depends  on  y^  »•  •  •  »y,j£_'j  *y2i 
we  obtain  from  (27): 

Sit  ■  SU-1-CTtSi,t-l+',iR2ii.t-l  +  d^ 

1  “  1  |  •  yFlj 

and 

S1t  *  "i  .t-l  “  t=T  t"»i  #t„i  +  di  3 

i  *  n*!  1  *  •  •  fii2 


.y21»---.y2>t. 


(30a) 


(30b) 


(30c) 


Recursion  (30c)  converges  for  any  initial  condition  (see  Lemma  A3).  (30a)  can 

be  written  as 


and  using  Lemma  A3  yields  that  (31)  converges  for  any  initial  condition  if  and 
only  if 


It  is  easy  to  see  that  if  (32)  holds  for  then  it  holds  for  any  c^. 

We  thus  have  proven: 

Theorem  V  The  means  of  Llt’  L2t  as  defined  by  the  recursion  (27)  converge 
to  a  solution  of  (21)  for  any  initial  condition,  if  and  only  if 


It  is  easy  to  see  that  if  (33)  holds  then  (21)  has  a  unique  solution.  If 
we  want  (27)  to  converge  to  a  solution  of  (21)  not  only  for  any  initial 
condition,  but  also  for  any  pair  of  measurements,  i.e.,  any  ,  Cg.  we  have 
to  consider  =  1  in  (33)  which  is  exactly  the  condition  for  convergence 
of  case  3  of  the  determinstic  case. 


Next  we  will  show  that  L^t,  converge  to  a  solution  of  (21)  in  the 
mean  square  sense,  under  condition  (33).  For  simplicity  and  w.l.o.g.  we 


will  assume  S-| C-J  =  0,  S2E2  =  ®’  We  can  write  (27)  component  wise  in  terms  of 

Xit’  mit  and  then  form  the  Products  J^t^jt*  i*j  =  l,...,nj,  i,j  =  l,.. 

n2  and  i  =  l,...,n^,  j*l,...,n2.  These  products  satisfy  recursions 

that  can  be  easily  calculated,  and  taking  expectations  of  which  result  in  a 


) 


recursion  which  gives  the  evolution  of  E(mitmjt^’  E^jtitrnjt^  in  terms 

of  Before  writins  down 


Then  satisfies: 

Ht*Nt-l-CT[Nt-lQ,  +  QNt-l^^l4(Nt-l>  '  (37) 

where  sd(-)  denotes  a  linear  time  invariant  function  of  its  argument.  (For 
details  of  this  derivation,  see  Appendix  B.) 

Using  Lemma  A4  we  conclude  that  Nt  goes  to  zero  for  any  initial  condition  if 
and  only  if  the  matrix  Q  has  eigenvalues  with  positive  real  parts  which  is 
easily  seen  to  be  equivalent  to  (33).  We  thus  have  proven 

Theorem  2'  L-^,  L2t  as  defined  by  recursion  (27)  converge  to  a  solution  of  (21) 

for  any  initial  condition,  in  the  mean  square  sense,  if  and  only  if  (33)  holds. 

Next,  we  will  show  that  (Llt*L2t)  converges  under  (33)  for  any  initial 
condition  to  the  solution  of  (21)  with  probability  1  (i.e.,  a.s.  convergence). 

We  again  assume  for  simplicity  and  w.l.o.g.  that  =  0,  S2C2=0.  We  will  use 
the  theorem  in  paragraph  3  of  [11]  (or  Lemma  3.5  of  [13])  which  we  restate  here 
and  which  is  an  easy  consequence  of  the  martingale  convergence  theorem  of 
Doob. 


Lemma  1  Let  {V^}  be  a  sequence  of  random  variables  such  that  E(V^)  exists. 

Let  A  be  a  real  number  and  suppose  VfsA.  Furthermore,  assume  that 

*  + 

l  E(E[Vt+i  -  Vt|V^ ,. . .  ,Vt]  )  converges.  Then  the  sequence  {V^.}  converges  wi 
t — 1 

probability  1. 

(Recall  that  if  x  is  a  random  variable:  x+ =  hi  JxJ  +  x) . )  Let  xt  = 
Ujt,...,X^  t,mj  t,...,m^  t)'.  We  will  prove  that  x^  converges  to  0  w.p.l 
or  equivalently  that  Vt  =  |xJ!^  does.  Let  A  =  0.  From  (27)  we  can  easily 


th 
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obtain  (see  Appendix  C) 


lE  cvt+1-vt|v, . Vt]|  s  |Vt 


for  some  positive  number  a  and  thus 


ECVi-wtlvi . V+  s  I  vt 


In  order  to  fulfill  the  assumption  of  Lemma  1,  it  suffices  to  show  that 


I  T  E(V  )  <  + 
t=l  t  1 


It  holds 


E[Vt]=  tr  N, 


and  thus  it  suffices  to  show  that 


I  <  +  • 

t=l  1 


From  (37)  we  obtain 


= n,  -  q  r  2  ^t-1  -  r  i  t-1  q' + 


If  we  assume  that  Q  has  eigenvalues  with  positive  real  parts  (40)  can  be  solved 
t  N. 

for  I  to  yield 
k=l  k 

t  N 

z 


t  N  /  t  N.v 

S,  T-^(fW  N«-  &  ?) 


t  N, 


t  N, 


Si 


ince  N.  converges,  it  is  bounded  and  so  is  I  -*■  .  Thus  Z  -r-  i; 
K  k=l  k^  k=l  k 


uniformly  bounded  and  thus  (39)  and  (38)  are  bounded.  We  thus  conclude  that 
||xtll  =  converges  with  probability  1.  || x^fj  converges  to  0  in  the  mean 
square  sense  by  Theorem  2'  and  thus  in  probability  and  thus  it  has  a  sub¬ 
sequence  converging  to  zero  with  probability  one  ([17],  Thm.  2,  5,  3,  p.  93). 
Since  we  just  showed  that  ||x^fi  converges  with  probability  one,  this  limit 
has  to  be  zero.  Let  us  now  summarize  the  results  of  this  section  in  a  Theorem. 


Theorem:  Llt,  L2t  as  defined  by  recursion  (27)  converge  to  a  solution  of  (21) 
for  any  initial  condition,  in  the  mean  square  sense  and  with  probability  one 
if  and  only  if 


(Under  this  condition  (21)  admits  a  unique  solution.) 

Remark  1  Nt>  (37),  goes  to  zero  but  it  does  not  have  to  converge 
monotonically. 


Remark  2  One  can  construct  the  stochastic  analogues  of  the  deterministic 
schemes  of  cases  1  and  2,  if  a  different  -  appropriate  -  approximation  is  used 


for  E[u2t^lt^  in  ^25^*  A  reflection,  though,  will  persuade  the 

reader  that  these  schemes  will  converge  under  conditions  more  stringent  than  (33). 

Remark  3  For  a  repeated  Stackelberg  game  one  can  consider  schemes  similar  to 
those  considered  here,  if  one  assumes  that  the  Leader  does  not  know  the 
parameters  involved  in  the  Follower's  cost.  An  idea  of  this  sort  was  recently 
studied  in  a  deterministic  framework  in  [  8  ]. 


Remark  4  It  should  be  clear  from  (30)  and  (37)  that  the  rate  of  convergence 
of  the  means  and  the  covariances  of  m^  depend  on  the  eigenvalues  of  the 
matrices  in  (32)  for  a.  =  1  ,a, ,. . .  ,cr  ,  or  equivalently  of  Q.  Actually,  a 

1  I  111 

1  r  i 

recursion  of  the  form  (A1 )  with  X=Re(X)>0  goes  to  zero  like  (n  )  (see  [12]). 
Thus  if  Xm  denotes  the  real  part  of  the  eigenvalues  of  Q,  m=  1 ,. . .  .n-j+ng 


and 


LX  \  ~1 


X  =  min  Re(Xm)  the  mean  converges  no  slower  than  (t  )  ,  the  covariances  no 

2X  -1  3X  -1 

slower  than  (t  )  ,  the  third  moments  no  slower  than  (t  )  and  so  on.  Thus 

0 

if  one  were  to  consider  whether  t  CL-|t**-2t^  converges  weakly  to  a  gaussian 
random  variable  as  t-*«,  8  should  be  chosen  equal  to  X  so  that  the  second 
moments  converge  to  a  nonzero  constant,  but  then  automatically  all  the  moments 
will  also  do  so.  Thus  in  general  one  cannot  have  asymptotic  normality  of 
n  for  some  0>O.  As  a  matter  of  fact.  Theorem  (1)  of  [12]  cannot  be 

applied  since  it's  assumption  (A4)  fails  for  the  stochastic  approximation 
algorithm  (27),  considered  here,  as  should  be  expected  from  the  above 
remarks.  Finally,  it  should  be  pointed  out  that  the  fact  that  the  rate  of 

_T 

convergence  of  the  algorithm  is  given  by  t  and  t  for  the  first  and  second 


moments,  is  a  useful  fact  when  implementing  it,  in  deciding  when  to  stop,  what  is 
the  probability  of  error  when  stopping  in  a  finite  number  of  Iterations,  etc. 


Remark  5  Stochastic  approximation  has  been  an  object  of  Intensive  study  (see 
[9-15]).  Several  of  the  results  available  can  be  used  to  prove  convergence 
of  the  iteration  (27)  but  they  demand  conditions  stronger  than  (33),  or  they 
are  not  applicable  to  it.  For  example,  in  [9]  it  is  required  that  in  the 
scheme  xp+^  =xn-^  yn»  yn  1s  uniformly  bounded.  Assumptions  III  and  IV  of 
[10]  do  not  hold  for  (27).  In  proving  asymptotic  normality  [12],  he  uses 
Assumption  (A4)  which  does  not  hold  for  (27).  Assumptions  A5,  A5 '  of  [11]  do 
not  hold  for  our  scheme.  Lemma  3.1  and  Theorem  4.3  of  [13]  can  be  applied  to 
(27)  but  result  in  more  stringent  conditions  than  (33).  The  convergence 
analysis  of  [15]  demands  boundness  of  the  second  term  in  (27)  which  is 
not  applicable  to  our  case.  Assumption  iii  in  Problem  1,  p.  92  of  [14] 
does  not  hold  for  (27). 


4.  CONCLUSIONS 


There  are  several  directions  in  which  this  research  can  be  continued. 

One  of  them  is  the  corresponding  problem  for  the  Stackelberg  game  (see  Remark  3 
in  Section  3).  The  dynamic  case  where  the  players  are  also  coupled  through  the 
evolution  of  a  discrete  time  equation  is  obviously  important  and  useful.  We 
hope  that  the  analysis  presented  here  will  be  helpful  in  such  further  research. 


APPENDIX  A 


lemma  A1  Consider  the  scalar  recursion 

xn+l  =  (1"n)xn*  n=1»2*3’---  (AD 

where  X  and  are  complex  numbers.  Then  xn-*0  “for  any  x-j  if  and  only  if 
Re(X)>0.  (If  we  set  t  sl+...  +  j  ,  we  see  that  (Al)  is  a  discrete  approxi- 
mation  of  x=-Xx  and  thus  Re(X)  >  0  is  expected  in  order  to  have  asymptotic 
stability  of  (Al).) 

Lemma  A2  Consider  the  scalar  recursion 

Vr('-»t0(T),V  n=  1,2,3,... 
n 

where  x  and  x^  are  complex  numbers.  Then  xn-*0  for  any  x-j  if  and  only  if 
Re(X) >  0. 

Proof 

It  is  an  immediate  consequence  of  Lemma  Al  since  ^  dominates  0(-^). 

n 

Lemma  A3  Consider  the  recursion 

xn+l  *  (I'n  A+0("7))xn’  "*1,2,3,...  (A3) 

where  A  is  a  real  square  matrix  and  x^  is  a  vector.  Then  xn**0  for  any  x^  if 
and  only  if  ReX(A) >  0. 


Proof  We  bring  A  to  it's  Jordan  form  and  apply  Lemma  A2.  It  is  helpful  to 
notice  that  if  P  is  a  real  symmetric  matrix 


xn+lPxn+l 


'  xnPxn - TT  XA  £PA  +  A'P]  *„♦  x-  Ot-U 


and  thus  if  A  has  ReX(A)>0,  we  can  find  a  positive  definite  P  so  that  A'P  +  PA>0. 
Therefore  if  n  is  sufficiently  large 

1  x'[PA+  A'P]  x  >  x '  oA)  x 
n  n  J  n  n  c  n 

n 

and  thus  xn+l  Pxn<xnPxn  an£*  consecluen^.y  xn  bounded.  This  justifies  the 

fact  that  the  r  term  dominates  in  (A3).  d 

n 


Lemma  A4  Consider  the  recursion 

Nt+1  =  Nt'l[NtQ'+QNt^  +  (NtK  (A4) 

where  Nt>  Q  are  square  matrices.  Nt-*0  for  any  initial  condition  if  and  only 
if  Re  X (Q) >  0. 


Proof  Let  xt  be  the  vector  composed  of  the  columns  of  N^.  We  can  write  the 
recursion  equivalently  as 

xt+l  *  xt  ’  t  ^t  +  ^7  ^xt^ 

It  can  be  checked  that  Re  X(A) >  0  if  and  only  if  Re  X(Q) >  0  and  thus  Lemma  A3 
can  be  applied.  □ 


It  should  be  pointed  out  that  if  xn  evolves  as  in  A1 ,  and  X  is  real,  xR 
behaves  like  n"X  (see  [12],  eq.  2.3).  If  X  is  complex,  then  (A2)  implies  that  jxj 
behaves  like  n”^a  and  thus  Jx^J  behaves  like  n’a,  i.e.,  n’^eX.  Consequently 
*n+l  ’n  behaves  like  n”\  where  £*min  ReX  (A)  and  Nt  in  (A4)  behaves  like 
t~^  where  X  =  min  ReX(Q). 


APPENDIX  B 


Let  X..^,  m^,  c^,  be  as  in  (28).  For  convenience,  let 


—  — 

yi,t-l  = 

yi 

*  y2,t-l  = 

Z1 

• 

• 

• 

0 

y", 

S 

(27)  can  be  written  as 


tita-ei,t-l-trrt‘ei,t-l+yiRl  *  zjmj,t-l+ci] 


1  c  1  j  *  •  • 


n,itemi,t-i"trrCmi,t-i+ziR2  *  Vj.t-i+di3 


i  *  1 ,  •  • • ,n. 


(Bl) 


(B2) 


(B3) 


For  convenience,  let  us  drop  the  subscript  t-1  from  X.  t  .j ,  m..  ^ .  From  (B2) 
(B3),  we  obtain: 


■■ 2  2 

*1txjt*  Vj”  CT  t2x1xj  +  yj  ^  VimxRl  +yiRl  kJ1  zkVj  + 

1  °2 

+  Vj  +  Vj3+^7  CVj  +  yj  WiRi  + 

i^2  1^2 

+  yiRl  ^  ZkVj+yiyjRl  |CiJpl  ZkWiRl  + 


27 


+  yiRi  ki,  Wiiyj  £  W,iRi*Vj  +  ci1jteicj:> 


1*3  *  1  *•••*!!% 


..  1 


■n'jt 1  Vj  -  nr  ^Vj ♦  zj  J, W?; +  2ir2  k2,  Wi + 

i 

+  Cmiroj + zj  j,  v ixiR2* 

"i  ni 

+  z.R,  I  y.  A.m'  +  z.z  .R.  Z  y.y  X  JTR1  + 

1  ^  1^=7  KKJ  1  J  *  |(,x=1  K  1  K  x  c 

°1  "l 

+  z.R,  I  y.i.dt  +  z.  Z  y  d.X'Rl +m.dl  +  d.m'. +d.d'.] 

1  6  1^3=1  K  K  J  J  x=l  *>  *  *  1  3  1  J  1  J 

i  ij *  1 *• • • j^2 

,  "l  n2 

litmjtsiimj-trrC2iimj  +  zj  ^  W?z  +  yiRi  ^  Wj  + 

+  X.d'.  +  c-m'.]  +  — * — *■  [i.in'.  +  z .  Z  y,A.X'RI  + 
i  J  i  J  <t_l  1  J  J  jM  1  1  X  Z 


n2  nl 


+  yiRl  k2,  Wj  +  yi2jRl  k2u2,ZkW?2  + 
n2  "2 

*  yiRi  kf,  2kVjtzj  „r,  WiS+1idjtci-j*cidj3 


1*l*»»«*n^*  J  *  1  j •  • « >Hf 


Let  ,  K*j  be  defined  as  In  (34),  let  c^,  d^  «  0  for  simplicity  and 

w.I.o.g..  We  take  expectations  In  (B4)-(B6)  and  drop  for  convenience  the 
superscript  t-1  from  A^«\  In  the  right  hand  side.  (When  taking 


expectations,  we  use  the  fact  that  jt.  ,  nu  are  independent  of  y 
y2  t-1 * ^  We  obtain: 

Aij  =  Aij  "  t V  [2Aij  +  °jKijRl  +  ^ 

+  [Aij  +  0jKijRi  +0iR1tKij)'  + 


°i0jRl(Mij  +  Mji)Ri»  if  ^ 


\k  +  E<WMii>Ri’  if  '** 
Mi 


1 1 J  ”  1  i  •  •  • 


M1d "  M1  j  ■  CT  t2H1  j  *  °jK1  jR2  +  °1R1 (K1 j1 ' ] 


+ - ?<JAii's) 

(t-1 )Z  2 


i , j  *  l,...,n2  and  a^-  0  if  i  >  n-j 

a .  ■  0  if  j  >  n. 
J  1 

Ki  j  *  Kij  "  t^T  ^2Ki  j  +  ajAi  jR2  +  a1R1Mi 


+  *^Aij»MijKij's)  • 

1  ■  1 ....  ,n^ ,  o^  ■  0  if  i  > 

j  -  1  o.  •  0  if  j  >  n. 


APPENDIX  C 


Let  xt= 

(B2),  (B3)  we  have 

xt+l  =  xt  "  t  ^R^y1t’y2t^xt^ 

where  the  definition  of  R(y-jt  = 

(Cl)  we  obtain 

llxt+il  =  lxtll  _txtRtxt  +  ^ 


t)‘.  Using  (27)  or  the  equivalent 
2,z 

(Cl) 

is  obvious  from  (B2),  (B3).  From 


(t^t^txt 


(C2) 


It  holds 


E  dl  xt+1ll  2  -  I!  xt||  2  [||x1||2,...,|xt|2]  -  (C3) 

=  E  C  E  Ell  xt+1l|  2  -  |  xt||  2 Jll  X'jl  2 . ||  xt||  2]Ix-t . xt] 

E  [x'Rtxt|||x1||2,...,||xt||2]  = 

=  E  [E  [x|.Rtxt||(x1fl2,...,||xt||2]|x1 . xt]  *  (C4) 

=  E  C  E  [x^R^x^ |x-| , . . . »x^]|x^  , * » - 3 III  x-jl  ••••»lx^ll  3 

=  E  fxlErRJx . ,xAlx  |||x,||2 . ||  xj2] 


Since  R^  depends  only  on  y^t,  which  are  independent  of  x. 
where  is  a  constant  matrix  defined  by 

e  [R(yu.  y2t):  -  R, 


Similarly 


E  Ext^t^txt^l  xl^  ••••»!  xtl  ^  = 

=  E  [x^R^x^  J]|  x^  j|  x^jj  ] 

where  R2  is  a  constant  matrix  defined  by 

E  £R'(yl,t’  y2,t^ylt,  y2t^  =  R2 
From  (C3)-(C5)  we  obtain: 

E  [|*t+1|Z-|»tl2llxil2 . H MZ3  ' 

=  e  E*i(-  ^  +  ^2 5xt I II xtH  .••••! xtl  ^ 


It  holds 


2 

t 


'1 


+7 


for  some  positive  constant  a  and  thus 
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of  an  opponent.  The  existing  work  on  adaptive  control  -  i.e.,  single  ob¬ 
jective  -problems  constitutes  an  important  background  for  adaptive  games, 
which  nonetheless  introduce  new  challenging  concepts  and  technical  dif¬ 
ficulties.  We  are  currently  continuing  this  line  of  research. 
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